Chemical documents: machine understanding and automated information extraction

Joe A Townsend; Sam E Adams; Christopher A Waudby; Vanessa K de Souza; Jonathan M Goodman; Peter Murray-Rust

doi:10.1039/b411033a

Chemical documents: machine understanding and automated information extraction

Org Biomol Chem. 2004 Nov 21;2(22):3294-300. doi: 10.1039/b411033a. Epub 2004 Oct 20.

Authors

Joe A Townsend¹, Sam E Adams, Christopher A Waudby, Vanessa K de Souza, Jonathan M Goodman, Peter Murray-Rust

Affiliation

¹ Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.

PMID: 15534707
DOI: 10.1039/b411033a

Abstract

Automatically extracting chemical information from documents is a challenging task, but an essential one for dealing with the vast quantity of data that is available. The task is least difficult for structured documents, such as chemistry department web pages or the output of computational chemistry programs, but requires increasingly sophisticated approaches for less structured documents, such as chemical papers. The identification of key units of information, such as chemical names, makes the extraction of useful information from unstructured documents possible.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Chemistry / methods*
Electronic Data Processing / methods*
Internet
Software*
Terminology as Topic