Using BERT to identify drug-target interactions from whole PubMed

BMC Bioinformatics. 2022 Jun 21;23(1):245. doi: 10.1186/s12859-022-04768-x.

Abstract

Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of curated articles likely constitutes only a fraction of all the articles that contain experimentally determined DTIs. Finding such articles and extracting the experimental information is a challenging task, and there is a pressing need for systematic approaches to assist the curation of DTIs. To this end, we applied Bidirectional Encoder Representations from Transformers (BERT) to identify such articles. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.

Results: Our novel method identified 0.6 million articles (along with drug and protein information) which are not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~ 99% accuracy for identifying articles containing quantitative drug-target profiles. The F1 micro for the prediction of assay format is 88%, which leaves room for improvement in future studies.

Conclusion: The BERT model in this study is robust and the proposed pipeline can be used to identify previously overlooked articles containing quantitative DTIs. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.

Keywords: BERT; BERT for biomedical data; Bidirectional encoder representations from transformers; Bioactivity data; Biomedical text mining; Drug repurposing; Drug target interaction prediction; Mining drug target interactions.

MeSH terms

  • Databases, Factual
  • Drug Interactions
  • Drug Repositioning*
  • Proteins* / metabolism
  • PubMed

Substances

  • Proteins