Representing structural databases in a self-organizing map

Acta Crystallogr B. 2005 Oct;61(Pt 5):548-57. doi: 10.1107/S0108768105020331. Epub 2005 Sep 23.

Abstract

This paper presents a way to accomodate large numbers of crystal structures, as present in e.g. the Cambridge Structural Database (CSD), in a self-organizing map. The structures are represented by their calculated powder diffraction patterns. The use of a recently introduced similarity criterion is essential: the weighted cross-correlation. This accurately reflects the similarities of the powder patterns and therefore, indirectly measures the resemblance of crystal packings. It will be shown that good results are obtained, even if the network is trained with a small subset of a complete database. This makes it possible to construct the map on common hardware in a few hours. Such a map provides several possibilities for two-dimensional visualization, but additionally has a number of important applications. Two such applications are fast and easy screening of a database, and providing an overview of the contents of a database in terms of structural diversity of specific chemical classes of compounds, e.g. steroids or peptides. A third is the selection of archetypical structures, covering the complete structural space.

MeSH terms

  • Algorithms
  • Chemistry / methods*
  • Crystallography, X-Ray
  • Databases, Factual*
  • Information Storage and Retrieval
  • Internet
  • Models, Chemical
  • Peptides / chemistry
  • Powders
  • Software
  • Steroids / chemistry
  • User-Computer Interface
  • X-Ray Diffraction

Substances

  • Peptides
  • Powders
  • Steroids