Machine Learning Maps Research Needs in COVID-19 Literature

Anhvinh Doanvo; Xiaolu Qian; Divya Ramjee; Helen Piontkivska; Angel Desai; Maimuna Majumder

doi:10.1016/j.patter.2020.100123

Machine Learning Maps Research Needs in COVID-19 Literature

Patterns (N Y). 2020 Dec 11;1(9):100123. doi: 10.1016/j.patter.2020.100123. Epub 2020 Sep 16.

Authors

Anhvinh Doanvo¹, Xiaolu Qian², Divya Ramjee³, Helen Piontkivska⁴, Angel Desai⁵, Maimuna Majumder^{6

7}

Affiliations

¹ COVID-19 Dispersed Volunteer Research Network, Washington, DC, USA.
² University of Washington, Seattle, WA, USA.
³ Department of Justice, Law & Criminology, American University, Washington, DC, USA.
⁴ Department of Biological Sciences, Kent State University, Kent, OH, USA.
⁵ University of California, Davis, Sacramento, CA, USA.
⁶ Harvard Medical School, Boston, MA, USA.
⁷ Children's Hospital Computational Health Informatics Program (CHIP), Boston, MA, USA.

Abstract

As of August 2020, thousands of COVID-19 (coronavirus disease 2019) publications have been produced. Manual assessment of their scope is an overwhelming task, and shortcuts through metadata analysis (e.g., keywords) assume that studies are properly tagged. However, machine learning approaches can rapidly survey the actual text of publication abstracts to identify research overlap between COVID-19 and other coronaviruses, research hotspots, and areas warranting exploration. We propose a fast, scalable, and reusable framework to parse novel disease literature. When applied to the COVID-19 Open Research Dataset, dimensionality reduction suggests that COVID-19 studies to date are primarily clinical, modeling, or field based, in contrast to the vast quantity of laboratory-driven research for other (non-COVID-19) coronavirus diseases. Furthermore, topic modeling indicates that COVID-19 publications have focused on public health, outbreak reporting, clinical care, and testing for coronaviruses, as opposed to the more limited number focused on basic microbiology, including pathogenesis and transmission.

Keywords: COVID-19; PCA; SARS-CoV-2; artificial intelligence; coronavirus; data science; dimensionality reduction; machine learning; natural language processing; topic modeling.

Abstract

Grants and funding