Quality of ethnicity data within Scottish health records and implications of misclassification for ethnic inequalities in severe COVID-19: a national linked data study

J Public Health (Oxf). 2024 Feb 23;46(1):116-122. doi: 10.1093/pubmed/fdad196.

Abstract

Background: We compared the quality of ethnicity coding within the Public Health Scotland Ethnicity Look-up (PHS-EL) dataset, and other National Health Service datasets, with the 2011 Scottish Census.

Methods: Measures of quality included the level of missingness and misclassification. We examined the impact of misclassification using Cox proportional hazards to compare the risk of severe coronavirus disease (COVID-19) (hospitalization & death) by ethnic group.

Results: Misclassification within PHS-EL was higher for all minority ethnic groups [12.5 to 69.1%] compared with the White Scottish majority [5.1%] and highest in the White Gypsy/Traveller group [69.1%]. Missingness in PHS-EL was highest among the White Other British group [39%] and lowest among the Pakistani group [17%]. PHS-EL data often underestimated severe COVID-19 risk compared with Census data. e.g. in the White Gypsy/Traveller group the Hazard Ratio (HR) was 1.68 [95% Confidence Intervals (CI): 1.03, 2.74] compared with the White Scottish majority using Census ethnicity data and 0.73 [95% CI: 0.10, 5.15] using PHS-EL data; and HR was 2.03 [95% CI: 1.20, 3.44] in the Census for the Bangladeshi group versus 1.45 [95% CI: 0.75, 2.78] in PHS-EL.

Conclusions: Poor quality ethnicity coding in health records can bias estimates, thereby threatening monitoring and understanding ethnic inequalities in health.

Keywords: COVID-19; ethnicity; quality.

MeSH terms

  • COVID-19*
  • Ethnicity*
  • Humans
  • Scotland / epidemiology
  • Semantic Web
  • State Medicine