NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies

Leihong Wu; Syed Ali; Heather Ali; Tyrone Brock; Joshua Xu; Weida Tong

doi:10.3390/ijerph19169974

NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies

Int J Environ Res Public Health. 2022 Aug 12;19(16):9974. doi: 10.3390/ijerph19169974.

Authors

Leihong Wu¹, Syed Ali¹, Heather Ali², Tyrone Brock^{1

3}, Joshua Xu¹, Weida Tong¹

Affiliations

¹ National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA.
² Department of Internal Medicine, University of Arkansas for Medical Sciences, 4301 West Markham, Little Rock, AR 72205, USA.
³ Department of Mathematics and Computer Science, University of Arkansas at Pine Bluff, 1200 University Drive, Pine Bluff, AR 71601, USA.

Abstract

COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.

Keywords: BERT model; COVID-19; information retrieval; language model; machine learning; neurological disorders; text mining.

MeSH terms

Artificial Intelligence
COVID-19*
Humans
Language
Nervous System Diseases* / epidemiology
Nervous System Diseases* / etiology

Grants and funding

This research received no external funding.