Anonymous Pattern Molecular Fingerprint and its Applications on Property Identification

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3759-3771. doi: 10.1109/TCBB.2023.3322697. Epub 2023 Dec 25.

Abstract

Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we investigate a novel molecular fingerprint Anonymous-FP that possesses abundant perception about the underlying interactions shaped in small, medium, and large-scale atom chains. In detail, the possible atom chains from each molecule are sampled and extended as anonymous atom chains using an anonymous encoding manner. After that, the molecular fingerprint Anonymous-FP is embedded into vectorial space in virtue of the Natural Language Processing technique PV-DBOW. Anonymous-FP is studied on molecular property identification via molecule classification experiments on a series of molecule databases and has shown valuable advantages such as less dependence on prior knowledge, rich information content, full structural significance, and high experimental performance. During the experimental verification, the scale of the atom chain or its anonymous pattern is found significant to the overall representation ability of Anonymous-FP. Generally, the typical scale r = 8 could enhance the molecule classification performance, and specifically, Anonymous-FP gains the classification accuracy to above 93% on all NCI datasets.

MeSH terms

  • Cheminformatics*
  • Databases, Chemical*