Using Artificial Intelligence to extract information on pathogen characteristics from scientific publications

Int J Hyg Environ Health. 2022 Aug:245:114018. doi: 10.1016/j.ijheh.2022.114018. Epub 2022 Aug 16.

Abstract

Health risk assessment of environmental exposure to pathogens requires complete and up to date knowledge. With the rapid growth of scientific publications and the protocolization of literature reviews, an automated approach based on Artificial Intelligence (AI) techniques could help extract meaningful information from the literature and make literature reviews more efficient. The objective of this research was to determine whether it is feasible to extract both qualitative and quantitative information from scientific publications about the waterborne pathogen Legionella on PubMed, using Deep Learning and Natural Language Processing techniques. The model effectively extracted the qualitative and quantitative characteristics with high precision, recall and F-score of 0.91, 0.80, and 0.85 respectively. The AI extraction yielded results that were comparable to manual information extraction. Overall, AI could reliably extract both qualitative and quantitative information about Legionella from scientific literature. Our study paved the way for a better understanding of the information extraction processes and is a first step towards harnessing AI to collect meaningful information on pathogen characteristics from environmental microbiology publications.

Keywords: Artificial intelligence; Exposure assessment; Information extraction; Legionella; Scientific publications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*