Self-Attention-Based Models for the Extraction of Molecular Interactions from Biological Texts

Prashant Srivastava; Saptarshi Bej; Kristina Yordanova; Olaf Wolkenhauer

doi:10.3390/biom11111591

Self-Attention-Based Models for the Extraction of Molecular Interactions from Biological Texts

Biomolecules. 2021 Oct 27;11(11):1591. doi: 10.3390/biom11111591.

Authors

Prashant Srivastava¹, Saptarshi Bej^{1

2}, Kristina Yordanova¹, Olaf Wolkenhauer^{1

2}

Affiliations

¹ Institute of Computer Science, University of Rostock, 18059 Rostock, Germany.
² Leibniz-Institute for Food Systems Biology, Technical University of Munich, 85354 Freising, Germany.

Abstract

For any molecule, network, or process of interest, keeping up with new publications on these is becoming increasingly difficult. For many cellular processes, the amount molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large-scale molecular interaction maps and database curation. Text mining and Natural-Language-Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and Machine-Learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention-based models, a special type of Neural-Network (NN)-based architecture that has recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at the sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conducted a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text.

Keywords: biological literature mining; natural language processing; relationship extraction; self-attention models; text mining.

Publication types

Review

MeSH terms

Data Mining*
Machine Learning
Natural Language Processing*