Integrating topic modeling and word embedding to characterize violent deaths

Proc Natl Acad Sci U S A. 2022 Mar 8;119(10):e2108801119. doi: 10.1073/pnas.2108801119. Epub 2022 Mar 3.

Abstract

SignificanceWe introduce an approach to identify latent topics in large-scale text data. Our approach integrates two prominent methods of computational text analysis: topic modeling and word embedding. We apply our approach to written narratives of violent death (e.g., suicides and homicides) in the National Violent Death Reporting System (NVDRS). Many of our topics reveal aspects of violent death not captured in existing classification schemes. We also extract gender bias in the topics themselves (e.g., a topic about long guns is particularly masculine). Our findings suggest new lines of research that could contribute to reducing suicides or homicides. Our methods are broadly applicable to text data and can unlock similar information in other administrative databases.

Keywords: gender; mortality surveillance; natural language processing; topic models; word embeddings.

MeSH terms

  • Databases, Factual*
  • Homicide*
  • Humans
  • Models, Theoretical*
  • United States
  • Violence*