Cluster-based text mining for extracting drug candidates for the prevention of COVID-19 from the biomedical literature

Ahmad Afif Supianto; Rizky Nurdiansyah; Chia-Wei Weng; Vicky Zilvan; Raden Sandra Yuwana; Andria Arisal; Hilman Ferdinandus Pardede; Min-Min Lee; Chien-Hung Huang; Ka-Lok Ng

doi:10.1016/j.jtumed.2022.12.015

Cluster-based text mining for extracting drug candidates for the prevention of COVID-19 from the biomedical literature

J Taibah Univ Med Sci. 2023 Aug;18(4):787-801. doi: 10.1016/j.jtumed.2022.12.015. Epub 2023 Jan 4.

Authors

Affiliations

¹ Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia.
² Department of Bioinformatics, Indonesia International Institute for Life Sciences, Indonesia.
³ Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan.
⁴ Department of Food Nutrition and Health Biotechnology, Asia University, Taiwan.
⁵ Department of Computer Science and Information Engineering, National Formosa University, Taiwan.
⁶ Department of Bioinformatics and Medical Engineering, Asia University, Taiwan.
⁷ Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan.
⁸ Center for Artificial Intelligence and Precision Medicine Research, Asia University, Taiwan.

Abstract
in English, Arabic

Objective: The coronavirus disease 2019 (COVID-19) health crisis that began at the end of 2019 made researchers around the world quickly race to find effective solutions. Related literature exploded and it was inevitable that an automated approach was needed to find useful information, namely text mining, to overcome COVID-19, especially in terms of drug candidate discovery. While text mining methods for finding drug candidates mostly try to extract bioentity associations from PubMed, very few of them mine with a clustering approach. The purpose of this study was to demonstrate the effectiveness of our approach to identify drugs for the prevention of COVID-19 through literature review, cluster analysis, drug docking calculations, and clinical trial data.

Methods: This research was conducted in four main stages. First, the text mining stage was carried out by involving Bidirectional Encoder Representations from Transformers for Biomedical to obtain vector representation of each word in the sentence from texts. The next stage generated the disease-drug associations, which were obtained from the correlation between disease and drug. Next, the clustering stage grouped the rules through the similarity of diseases by utilizing Term Frequency-Inverse Document Frequency as its feature. Finally, the drug candidate extraction stage was processed through leveraging PubChem and DrugBank databases. We further used the drug docking package AUTODOCK VINA in PyRx software to verify the results.

Results: Comparative analyses showed that the percentage of findings using mining with clustering outperformed mining without clustering in all experimental settings. In addition, we suggest that the top three drugs/phytochemicals by drug docking analysis may be effective in preventing COVID-19.

Conclusions: The proposed method for text mining utilizing the clustering method is quite promising in the discovery of drug candidates for the prevention of COVID-19 through the biomedical literature.

أهداف البحث: جعلت الأزمة الصحية كوفيد-19 التي بدأت في نهاية عام 2019 الباحثين من جميع أنحاء العالم يتسابقون بسرعة لإيجاد حلول فعالة حتى الآن. كثرت الأبحاث ذات الصلة وكان من المحتم أن تكون هناك حاجة إلى نهج آلي للعثور على معلومات مفيدة ، وبالتحديد التنقيب عن النص ، للتغلب على كوفيد-19، لا سيما فيما يتعلق باكتشاف مرشح العلاج. بينما تحاول طرق التنقيب عن النص للعثور على الأدوية المرشحة في الغالب استخراج ارتباطات حيوية من "بابميد"، إلا أن القليل جدا منها يستخدم أسلوب التجميع. الغرض من البحث هو إثبات فعالية نهجنا في تحديد الأدوية للوقاية من كوفيد-19 من خلال مراجعة الأبحاث وتحليل الكتلة وحسابات إرساء الأدوية وبيانات التجارب السريرية.

طريقة البحث: تم إجراء هذا البحث في أربع مراحل رئيسية. أولا، تم تنفيذ مرحلة التنقيب عن النص من خلال إشراك "بايوبيرت" للحصول على تمثيل متجه لكل كلمة في الجملة من النصوص. كانت المرحلة التالية هي إنشاء روابط دوائية للأمراض يتم الحصول عليها من المراسلات بين المرض والعقار. بعد ذلك ، جمعت مرحلة التجميع القواعد من خلال تشابه الأمراض من خلال استخدام "تي إف-آي دي إف" كميزات لها. أخيرا، تتم معالجة مرحلة استخراج مرشح الدواء من خلال الاستفادة من قواعد بيانات "بابكيم" و "بنك الدواء". كما استخدمنا حزمة إرساء الأدوية "أوتودوك فينا" في برنامج "بي واي آر إكس" للتحقق من النتائج.

النتائج: أظهر التحليل المقارن الذي تم إجراؤه أن النسبة المئوية للنتائج المستخدمة في التعدين مع العنقودية تفوقت على التعدين دون التجميع في جميع البيئات التجريبية. بالإضافة إلى ذلك ، اقترحنا أن أفضل ثلاثة أدوية / مواد كيميائية نباتية من خلال تحليل الالتحام بالعقاقير قد تكون فعالة في الوقاية من كوفيد-19.

الاستنتاجات: تعد الطريقة المقترحة لتعدين النص باستخدام طريقة التجميع واعدة للغاية في اكتشاف الوقاية من الأدوية المرشحة لكوفيد-19 من خلال الأدبيات الطبية الحيوية.

Keywords: COVID-19; Coronavirus; Drug docking; Phytochemicals; SARS-CoV-2; Text mining.

Abstract in English, Arabic

Abstract
in English, Arabic