MolTalk--a programming library for protein structures and structure analysis

BMC Bioinformatics. 2004 Apr 19:5:39. doi: 10.1186/1471-2105-5-39.

Abstract

Background: Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn.

Results: We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page http://www.moltalk.org following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code.

Conclusion: MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications:1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot.2) To quickly retrieve information for (a limited number of) macro-molecular structures, i.e. H-bonds, salt bridges, contacts between amino acids and ligands or at the interface between two chains.3) To programme more complex structural bioinformatics software and to implement demanding algorithms through its portability to Objective-C, e.g. iMolTalk.4) To be used as a front end to databases, e.g. PDBChainSaw.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Computational Biology / methods
  • Database Management Systems*
  • Internet
  • Programming Languages
  • Protein Conformation*
  • Software Design
  • Software*