A simulation study assessing the accuracy and reliability of orchidometer estimation of testicular volume

Clin Endocrinol (Oxf). 2019 Apr;90(4):623-629. doi: 10.1111/cen.13923. Epub 2019 Jan 23.

Abstract

Context: Measuring testicular volume (TV) by orchidometer is the standard method of male pubertal staging. A paucity of evidence exists as to its inter- and intra-observer reliability and the impact of clinicians' gender, training and experience on accuracy.

Objective: Prosthetic testicular models were engineered to investigate accuracy and reliability of TV estimation.

Design: Simulation study.

Setting: Conducted over three-day 2015 British Society for Paediatric Endocrinology and Diabetes (BSPED) meeting.

Participants: Two hundred fifteen meeting delegates (161F, 54M): 50% consultants, 30% trainees, 9% clinical nurse specialists, 11% other professionals.

Intervention: Three child-sized mannequins displayed latex scrotum containing prosthetic testicles of 3, 4, 5, 10 and 20 mL. Demographic data, paediatric endocrinology experience, TV examination training, examination technique and TV estimations were collected. Delegates were asked to repeat their measurements later during the meeting. Scrotum order was changed daily.

Main outcome measures: Accuracy by variance from the simulated TV. Inter- and intra-observer variability.

Results: One thousand two hundred eighty four individual estimations were obtained. Eighty-five participants repeated measurements. Delegates measured TV accurately on 33.4% (±2.6) of occasions: overestimations 37% (±2.3), underestimations 28% (±1.8) (Fleiss' Kappa score 0.04). The accuracy of assessing a 4 mL testis was 36%-39%. Observers underestimated the volume when paired with a 3 mL testis and overestimated when paired with a 5 mL testis demonstrating a tendency impose biological symmetry. Intra-observer reliability was lacking; individuals giving different estimations for the same size testicle on 61% (±4.2) of occasions, 20% (±3.5) of estimations were more than 1 size outside the previous measurement. On only 39% (±4.2) of occasions did individuals agree with their previous estimation (irrespective of whether or not it was initially accurate). Training did not impact on results but experience did improve accuracy.

Conclusions: Overall TV estimation accuracy was poor. Considerable variation exists between and within subjects. Seniority slightly improved measurement estimation.

Keywords: experience; interobserver; intra-observer; measurement error; orchidometer; training.

MeSH terms

  • Adult
  • Anthropometry / methods*
  • Female
  • Humans
  • Male
  • Observer Variation
  • Testis / diagnostic imaging*