Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method

JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426.

Abstract

Background: Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce.

Objective: The goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes.

Methods: We conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases.

Results: In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively.

Conclusions: The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.

Keywords: EHR; NLP; SIRVA; artificial intelligence; big data; causal relation; electronic health records; health; informatics; natural language processing; pharmacovigilance; population health; real-world data; shoulder injury related to vaccine administration; temporal relation; vaccine safety; vaccines.

Publication types

  • Validation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Humans
  • Natural Language Processing
  • Shoulder Injuries* / epidemiology
  • Shoulder Injuries* / etiology
  • United States / epidemiology
  • Vaccination* / adverse effects
  • Vaccines* / adverse effects

Substances

  • Vaccines