Core Statistical Methods for Chemogenomic Data

Methods Mol Biol. 2018:1825:227-277. doi: 10.1007/978-1-4939-8639-2_7.

Abstract

Chemogenomic modeling involves the construction of algorithmic or statistical models for prediction on new input data and is often based on noisy, multidescriptor data. A deeper understanding of such data through statistical analyses can underpin informed study design and increase information gain from prediction results and model performances. This chapter mediates basic statistical concepts and provides step-by-step instructions to explore and visualize chemogenomic data based on the statistics-centered, open-source software R. Directions on executing essential techniques such as the calculation of correlations, hypothesis testing, and clustering are provided.

Keywords: Chemogenomic data; Clustering; Correlation; Feature importance; Hypothesis testing; Normality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Data Visualization
  • Databases, Factual
  • Genomics / methods*
  • Humans
  • Models, Statistical*
  • Pharmaceutical Preparations / chemistry*
  • Software*

Substances

  • Pharmaceutical Preparations