Global Research on Coronaviruses: An R Package

J Med Internet Res. 2020 Aug 11;22(8):e19615. doi: 10.2196/19615.

Abstract

Background: In these trying times, we developed an R package about bibliographic references on coronaviruses. Working with reproducible research principles based on open science, disseminating scientific information, providing easy access to scientific production on this particular issue, and offering a rapid integration in researchers' workflows may help save time in this race against the virus, notably in terms of public health.

Objective: The goal is to simplify the workflow of interested researchers, with multidisciplinary research in mind. With more than 60,500 medical bibliographic references at the time of publication, this package is among the largest about coronaviruses.

Methods: This package could be of interest to epidemiologists, researchers in scientometrics, biostatisticians, as well as data scientists broadly defined. This package collects references from PubMed and organizes the data in a data frame. We then built functions to sort through this collection of references. Researchers can also integrate the data into their pipeline and implement them in R within their code libraries.

Results: We provide a short use case in this paper based on a bibliometric analysis of the references made available by this package. Classification techniques can also be used to go through the large volume of references and allow researchers to save time on this part of their research. Network analysis can be used to filter the data set. Text mining techniques can also help researchers calculate similarity indices and help them focus on the parts of the literature that are relevant for their research.

Conclusions: This package aims at accelerating research on coronaviruses. Epidemiologists can integrate this package into their workflow. It is also possible to add a machine learning layer on top of this package to model the latest advances in research about coronaviruses, as we update this package daily. It is also the only one of this size, to the best of our knowledge, to be built in the R language.

Keywords: COVID-19; R package; SARS-CoV-2; bibliometric; coronavirus; infectious disease; informatics; reference; virus.

MeSH terms

  • Betacoronavirus*
  • COVID-19
  • Coronavirus Infections*
  • Humans
  • Language
  • Machine Learning
  • Pandemics*
  • Pneumonia, Viral*
  • PubMed
  • Publishing
  • SARS-CoV-2