Molecular Cavity Topological Representation for Pattern Analysis: A NLP Analogy-Based Word2Vec Method

Int J Mol Sci. 2019 Nov 29;20(23):6019. doi: 10.3390/ijms20236019.

Abstract

Cavity analysis in molecular dynamics is important for understanding molecular function. However, analyzing the dynamic pattern of molecular cavities remains a difficult task. In this paper, we propose a novel method to topologically represent molecular cavities by vectorization. First, a characterization of cavities is established through Word2Vec model, based on an analogy between the cavities and natural language processing (NLP) terms. Then, we use some techniques such as dimension reduction and clustering to conduct an exploratory analysis of the vectorized molecular cavity. On a real data set, we demonstrate that our approach is applicable to maintain the topological characteristics of the cavity and can find the change patterns from a large number of cavities.

Keywords: Word2Vec model; analogy-based methods; molecular cavity; topological representation.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Databases, Protein
  • Drug Discovery
  • Humans
  • Ligands
  • Molecular Dynamics Simulation
  • Natural Language Processing
  • Protein Conformation
  • Proteins / chemistry*
  • Software

Substances

  • Ligands
  • Proteins