Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method

Chengyi Zheng; Jonathan Duffy; In-Lu Amy Liu; Lina S Sy; Ronald A Navarro; Sunhea S Kim; Denison S Ryan; Wansu Chen; Lei Qian; Cheryl Mercado; Steven J Jacobsen

doi:10.2196/30426

Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method

JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426.

Authors

Affiliations

¹ Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.
² Immunization Safety Office, Centers for Disease Control and Prevention, Atlanta, GA, United States.
³ Kaiser Permanente South Bay Medical Center, Harbor City, CA, United States.

PMID: 35608886
PMCID: PMC9175103
DOI: 10.2196/30426

Abstract

Background: Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce.

Objective: The goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes.

Methods: We conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases.

Results: In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively.

Conclusions: The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.

Keywords: EHR; NLP; SIRVA; artificial intelligence; big data; causal relation; electronic health records; health; informatics; natural language processing; pharmacovigilance; population health; real-world data; shoulder injury related to vaccine administration; temporal relation; vaccine safety; vaccines.

©Chengyi Zheng, Jonathan Duffy, In-Lu Amy Liu, Lina S Sy, Ronald A Navarro, Sunhea S Kim, Denison S Ryan, Wansu Chen, Lei Qian, Cheryl Mercado, Steven J Jacobsen. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 24.05.2022.

Publication types

Validation Study
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Humans
Natural Language Processing
Shoulder Injuries* / epidemiology
Shoulder Injuries* / etiology
United States / epidemiology
Vaccination* / adverse effects
Vaccines* / adverse effects

Substances

Vaccines