Text Processing

Adv Exp Med Biol. 2019:1137:45-60. doi: 10.1007/978-3-030-13845-5_4.

Abstract

In the previous chapter we were able to automatically process structured data to retrieve biomedical text about any chemical compound, such as caffeine. This chapter will provide a step-by-step introduction to how we can process that text using shell script commands, specifically extract information about diseases related to caffeine. The goal is to equip the reader with an essential set of skills to extract meaningful information from any text.

Keywords: Evaluation metrics; NER: Named-Entity Recognition; NLP: Natural Language Processing; Pattern matching; Regular expressions; Relation extraction; String matching; Text mining; Tokenization; Word matching.

MeSH terms

  • Caffeine
  • Data Mining / methods*
  • Electronic Data Processing*
  • Software

Substances

  • Caffeine