Expanding Our Understanding of COVID-19 from Biomedical Literature Using Word Embedding

Heyoung Yang; Eunsoo Sohn

doi:10.3390/ijerph18063005

Expanding Our Understanding of COVID-19 from Biomedical Literature Using Word Embedding

Int J Environ Res Public Health. 2021 Mar 15;18(6):3005. doi: 10.3390/ijerph18063005.

Authors

Heyoung Yang¹, Eunsoo Sohn¹

Affiliation

¹ Future Technology Analysis Center, Korea Institute of Science and Technology Information, 66, Hoegi-ro, Dongdaemun-gu, Seoul 02456, Korea.

Abstract

A better understanding of the clinical characteristics of coronavirus disease 2019 (COVID-19) is urgently required to address this health crisis. Numerous researchers and pharmaceutical companies are working on developing vaccines and treatments; however, a clear solution has yet to be found. The current study proposes the use of artificial intelligence methods to comprehend biomedical knowledge and infer the characteristics of COVID-19. A biomedical knowledge base was established via FastText, a word embedding technique, using PubMed literature from the past decade. Subsequently, a new knowledge base was created using recently published COVID-19 articles. Using this newly constructed knowledge base from the word embedding model, a list of anti-infective drugs and proteins of either human or coronavirus origin were inferred to be related, because they are located close to COVID-19 on the knowledge base. This study attempted to form a method to quickly infer related information about COVID-19 using the existing knowledge base, before sufficient knowledge about COVID-19 is accumulated. With COVID-19 not completely overcome, machine learning-based research in the PubMed literature will provide a broad guideline for researchers and pharmaceutical companies working on treatments for COVID-19.

Keywords: COVID-19; PubMed literature; drug repurposing; machine learning; medical subject headings; substance name; word embedding.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence
COVID-19*
Coronavirus Infections*
Humans
Machine Learning
SARS-CoV-2

Grants and funding

K-20-L03-C01-S01, K-20-L04-C03-S04/Korea Institute of Science and Technology Information