Machine Learning from Omics Data

Methods Mol Biol. 2022:2390:421-431. doi: 10.1007/978-1-0716-1787-8_18.

Abstract

Machine learning (ML) already accelerates discoveries in many scientific fields and is the driver behind several new products. Recently, growing sample sizes enabled the use of ML approaches in larger omics studies. This work provides a guide through a typical analysis of an omics dataset using ML. As an example, this chapter demonstrates how to build a model predicting Drug-Induced Liver Injury based on transcriptomics data contained in the LINCS L1000 dataset. Each section covers best practices and pitfalls starting from data exploration and model training including hyperparameter search to validation and analysis of the final model. The code to reproduce the results is available at https://github.com/Evotec-Bioinformatics/ml-from-omics .

Keywords: Artificial intelligence; DILI; Drug discovery; Drug-Induced Liver Injury; Machine learning; SVM; Support vector machine; Transcriptomics.

MeSH terms

  • Chemical and Drug Induced Liver Injury
  • Humans
  • Machine Learning*
  • Support Vector Machine