Ascertaining the Francophone population in Ontario: validating the language variable in health data

BMC Med Res Methodol. 2024 Apr 27;24(1):98. doi: 10.1186/s12874-024-02220-7.

Abstract

Background: Language barriers can impact health care and outcomes. Valid and reliable language data is central to studying health inequalities in linguistic minorities. In Canada, language variables are available in administrative health databases; however, the validity of these variables has not been studied. This study assessed concordance between language variables from administrative health databases and language variables from the Canadian Community Health Survey (CCHS) to identify Francophones in Ontario.

Methods: An Ontario combined sample of CCHS cycles from 2000 to 2012 (from participants who consented to link their data) was individually linked to three administrative databases (home care, long-term care [LTC], and mental health admissions). In total, 27,111 respondents had at least one encounter in one of the three databases. Language spoken at home (LOSH) and first official language spoken (FOLS) from CCHS were used as reference standards to assess their concordance with the language variables in administrative health databases, using the Cohen kappa, sensitivity, specificity, positive predictive value (PPV), and negative predictive values (NPV).

Results: Language variables from home care and LTC databases had the highest agreement with LOSH (kappa = 0.76 [95%CI, 0.735-0.793] and 0.75 [95%CI, 0.70-0.80], respectively) and FOLS (kappa = 0.66 for both). Sensitivity was higher with LOSH as the reference standard (75.5% [95%CI, 71.6-79.0] and 74.2% [95%CI, 67.3-80.1] for home care and LTC, respectively). With FOLS as the reference standard, the language variables in both data sources had modest sensitivity (53.1% [95%CI, 49.8-56.4] and 54.1% [95%CI, 48.3-59.7] in home care and LTC, respectively) but very high specificity (99.8% [95%CI, 99.7-99.9] and 99.6% [95%CI, 99.4-99.8]) and predictive values. The language variable from mental health admissions had poor agreement with all language variables in the CCHS.

Conclusions: Language variables in home care and LTC health databases were most consistent with the language often spoken at home. Studies using language variables from administrative data can use the sensitivity and specificity reported from this study to gauge the level of mis-ascertainment error and the resulting bias.

Keywords: Administrative health data; Case ascertainment; Francophones; Linguistic variables; Validity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Communication Barriers
  • Databases, Factual / statistics & numerical data
  • Female
  • Health Surveys / methods
  • Health Surveys / statistics & numerical data
  • Home Care Services / standards
  • Home Care Services / statistics & numerical data
  • Humans
  • Language*
  • Long-Term Care / methods
  • Long-Term Care / standards
  • Long-Term Care / statistics & numerical data
  • Male
  • Middle Aged
  • Ontario
  • Reproducibility of Results