French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space

Mol Inform. 2023 Apr;42(4):e2200208. doi: 10.1002/minf.202200208. Epub 2023 Feb 6.

Abstract

In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.

Keywords: ChEMBL; Generative Topographic Mapping; ZINC; chemical space; chimiothèque Nationale.

MeSH terms

  • Databases, Chemical*