Text classification technique for discovering country-based publications from international COVID-19 publications

Digit Health. 2023 Jul 3:9:20552076231185674. doi: 10.1177/20552076231185674. eCollection 2023 Jan-Dec.

Abstract

Objective: The significant increase in the number of COVID-19 publications, on the one hand, and the strategic importance of this subject area for research and treatment systems in the health field, on the other hand, reveals the need for text-mining research more than ever. The main objective of the present paper is to discover country-based publications from international COVID-19 publications with text classification techniques.

Methods: The present paper is applied research that has been performed using text-mining techniques such as clustering and text classification. The statistical population is all COVID-19 publications from PubMed Central® (PMC), extracted from November 2019 to June 2021. Latent Dirichlet allocation (LDA) was used for clustering, and support vector machine (SVM), scikit-learn library, and Python programming language were used for text classification. Text classification was applied to discover the consistency of Iranian and international topics.

Results: The findings showed that seven topics were extracted using the LDA algorithm for international and Iranian publications on COVID-19. Moreover, the COVID-19 publications show the largest share in the subject area of "Social and Technology in COVID-19" at the international (April 2021) and national (February 2021) levels with 50.61% and 39.44%, respectively. The highest rate of publications at international and national levels was in April 2021 and February 2021, respectively.

Conclusion: One of the most important results of this study was discovering a common trend and consistency of Iranian and international publications on COVID-19. Accordingly, in the topic category "Covid-19 Proteins: Vaccine and Antibody Response," Iranian publications have a common publishing and research trend with international ones.

Keywords: COVID-19; Publication; artificial intelligence; machine learning; python; text classification; text mining.