Retrieval of crystallographically-derived molecular geometry information

J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2133-44. doi: 10.1021/ci049780b.

Abstract

The crystallographically determined bond length, valence angle, and torsion angle information in the Cambridge Structural Database (CSD) has many uses. However, accessing it by means of conventional substructure searching requires nontrivial user intervention. In consequence, these valuable data have been underutilized and have not been directly accessible to client applications. The situation has been remedied by development of a new program (Mogul) for automated retrieval of molecular geometry data from the CSD. The program uses a system of keys to encode the chemical environments of fragments (bonds, valence angles, and acyclic torsions) from CSD structures. Fragments with identical keys are deemed to be chemically identical and are grouped together, and the distribution of the appropriate geometrical parameter (bond length, valence angle, or torsion angle) is computed and stored. Use of a search tree indexed on key values, together with a novel similarity calculation, then enables the distribution matching any given query fragment (or the distributions most closely matching, if an adequate exact match is unavailable) to be found easily and with no user intervention. Validation experiments indicate that, with rare exceptions, search results afford precise and unbiased estimates of molecular geometrical preferences. Such estimates may be used, for example, to validate the geometries of libraries of modeled molecules or of newly determined crystal structures or to assist structure solution from low-resolution (e.g. powder diffraction) X-ray data.