Discovering Content through Text Mining for a Synthetic Biology Knowledge System

ACS Synth Biol. 2022 Jun 17;11(6):2043-2054. doi: 10.1021/acssynbio.1c00611. Epub 2022 Jun 7.

Abstract

Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.

Keywords: concept grounding; named entity recognition; natural language processing; relation extraction; synthetic biology text processing pipeline; topic modeling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining* / methods
  • Natural Language Processing
  • Synthetic Biology*