Chemical documents: machine understanding and automated information extraction

Org Biomol Chem. 2004 Nov 21;2(22):3294-300. doi: 10.1039/b411033a. Epub 2004 Oct 20.

Abstract

Automatically extracting chemical information from documents is a challenging task, but an essential one for dealing with the vast quantity of data that is available. The task is least difficult for structured documents, such as chemistry department web pages or the output of computational chemistry programs, but requires increasingly sophisticated approaches for less structured documents, such as chemical papers. The identification of key units of information, such as chemical names, makes the extraction of useful information from unstructured documents possible.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chemistry / methods*
  • Electronic Data Processing / methods*
  • Internet
  • Software*
  • Terminology as Topic