Release of (and lessons learned from mining) a pioneering large toxicogenomics database

Pharmacogenomics. 2015 Jul;16(8):779-801. doi: 10.2217/pgs.15.38. Epub 2015 Jun 12.

Abstract

Aim: We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds.

Materials & methods: The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays.

Results: The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches.

Conclusion: Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.

Keywords: connectivity map; database; hepatotoxicity; liver; microarray; rat; toxicogenomics.

MeSH terms

  • Animals
  • Data Mining
  • Databases, Factual*
  • Humans
  • Liver / drug effects
  • Liver / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods
  • Rats
  • Toxicogenetics*
  • Transcriptome