Text Processing

Francisco M Couto

doi:10.1007/978-3-030-13845-5_4

Text Processing

Adv Exp Med Biol. 2019:1137:45-60. doi: 10.1007/978-3-030-13845-5_4.

Author

Francisco M Couto¹

Affiliation

¹ LASIGE, Department of Informatics, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.

PMID: 31183819
DOI: 10.1007/978-3-030-13845-5_4

Abstract

In the previous chapter we were able to automatically process structured data to retrieve biomedical text about any chemical compound, such as caffeine. This chapter will provide a step-by-step introduction to how we can process that text using shell script commands, specifically extract information about diseases related to caffeine. The goal is to equip the reader with an essential set of skills to extract meaningful information from any text.

Keywords: Evaluation metrics; NER: Named-Entity Recognition; NLP: Natural Language Processing; Pattern matching; Regular expressions; Relation extraction; String matching; Text mining; Tokenization; Word matching.

MeSH terms

Caffeine
Data Mining / methods*
Electronic Data Processing*
Software

Substances

Caffeine