Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology

Methods Mol Biol. 2021:2190:167-184. doi: 10.1007/978-1-0716-0826-5_7.

Abstract

While the term artificial intelligence and the concept of deep learning are not new, recent advances in high-performance computing, the availability of large annotated data sets required for training, and novel frameworks for implementing deep neural networks have led to an unprecedented acceleration of the field of molecular (network) biology and pharmacogenomics. The need to align biological data to innovative machine learning has stimulated developments in both data integration (fusion) and knowledge representation, in the form of heterogeneous, multiplex, and biological networks or graphs. In this chapter we briefly introduce several popular neural network architectures used in deep learning, namely, the fully connected deep neural network, recurrent neural network, convolutional neural network, and the autoencoder. Deep learning predictors, classifiers, and generators utilized in modern feature extraction may well assist interpretability and thus imbue AI tools with increased explication, potentially adding insights and advancements in novel chemistry and biology discovery.The capability of learning representations from structures directly without using any predefined structure descriptor is an important feature distinguishing deep learning from other machine learning methods and makes the traditional feature selection and reduction procedures unnecessary. In this chapter we briefly show how these technologies are applied for data integration (fusion) and analysis in drug discovery research covering these areas: (1) application of convolutional neural networks to predict ligand-protein interactions; (2) application of deep learning in compound property and activity prediction; (3) de novo design through deep learning. We also: (1) discuss some aspects of future development of deep learning in drug discovery/chemistry; (2) provide references to published information; (3) provide recently advocated recommendations on using artificial intelligence and deep learning in -omics research and drug discovery.

Keywords: Autoencoder; Chemistry; Convolution graph network; Data integration; Deep learning; Deep neural network; Drug discovery; Embeddings; Network biology; Recurrent neural network.

Publication types

  • Review

MeSH terms

  • Artificial Intelligence
  • Databases, Genetic
  • Deep Learning
  • Drug Discovery / methods*
  • Humans
  • Machine Learning
  • Molecular Biology / methods*
  • Neural Networks, Computer