Interobserver and Intraobserver Agreement are Unsatisfactory When Determining Abstract Study Design and Level of Evidence

J Pediatr Orthop. 2022 Jul 1;42(6):e696-e700. doi: 10.1097/BPO.0000000000002136. Epub 2022 Mar 10.

Abstract

Background: Understanding differences between types of study design (SD) and level of evidence (LOE) are important when selecting research for presentation or publication and determining its potential clinical impact. The purpose of this study was to evaluate interobserver and intraobserver reliability when assigning LOE and SD as well as quantify the impact of a commonly used reference aid on these assessments.

Methods: Thirty-six accepted abstracts from the Pediatric Orthopaedic Society of North America (POSNA) 2021 annual meeting were selected for this study. Thirteen reviewers from the POSNA Evidence-Based Practice Committee were asked to determine LOE and SD for each abstract, first without any assistance or resources. Four weeks later, abstracts were reviewed again with the guidance of the Journal of Bone and Joint Surgery (JBJS) LOE chart, which is adapted from the Oxford Centre for Evidence-Based Medicine. Interobserver and intraobserver reliability were calculated using Fleiss' kappa statistic (k). χ2 analysis was used to compare the rate of SD-LOE mismatch between the first and second round of reviews.

Results: Interobserver reliability for LOE improved slightly from fair (k=0.28) to moderate (k=0.43) with use of the JBJS chart. There was better agreement with increasing LOE, with the most frequent disagreement between levels 3 and 4. Interobserver reliability for SD was fair for both rounds 1 (k=0.29) and 2 (k=0.37). Similar to LOE, there was better agreement with stronger SD. Intraobserver reliability was widely variable for both LOE and SD (k=0.10 to 0.92 for both). When matching a selected SD to its associated LOE, the overall rate of correct concordance was 82% in round 1 and 92% in round 2 (P<0.001).

Conclusion: Interobserver reliability for LOE and SD was fair to moderate at best, even among experienced reviewers. Use of the JBJS/Oxford chart mildly improved agreement on LOE and resulted in less SD-LOE mismatch, but did not affect agreement on SD.

Level of evidence: Level II.

MeSH terms

  • Child
  • Evidence-Based Medicine
  • Humans
  • Observer Variation
  • Orthopedics*
  • Reproducibility of Results
  • Research Design*