A Java API for working with PubChem datasets

Bioinformatics. 2011 Mar 1;27(5):741-2. doi: 10.1093/bioinformatics/btq715. Epub 2011 Jan 6.

Abstract

PubChem is a public repository of chemical structures and associated biological activities. The PubChem BioAssay database contains assay descriptions, conditions and readouts and biological screening results that have been submitted by the biomedical research community. The PubChem web site and Power User Gateway (PUG) web service allow users to interact with the data and raw files are available via FTP. These resources are helpful to many but there can also be great benefit by using a software API to manipulate the data. Here, we describe a Java API with entity objects mapped to the PubChem Schema and with wrapper functions for calling the NCBI eUtilities and PubChem PUG web services. PubChem BioAssays and associated chemical compounds can then be queried and manipulated in a local relational database. Features include chemical structure searching and generation and display of curve fits from stored dose-response experiments, something that is not yet available within PubChem itself. The aim is to provide researchers with a fast, consistent, queryable local resource from which to manipulate PubChem BioAssays in a database agnostic manner. It is not intended as an end user tool but to provide a platform for further automation and tools development.

Availability: http://code.google.com/p/pubchemdb.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods
  • Databases, Factual*
  • Information Storage and Retrieval / methods
  • Internet*
  • Programming Languages
  • Software*