The influence of different structure representations on the clustering of an RNA nucleotides data set

J Chem Inf Comput Sci. 2001 Sep-Oct;41(5):1388-94. doi: 10.1021/ci0103626.

Abstract

The last couple of years an overwhelming amount of data has emerged in the field of biomolecular structure determination. To explore information hidden in these structure databases, clustering techniques can be used. The outcome of the clustering experiments largely depends, among others, on the way the data is represented; therefore, the choice how to represent the molecular structure information is extremely important. This article describes what the influence of the different representations on the clustering is and how it can be analyzed by means of a dendrogram comparison method. All experiments are performed using a data set consisting of RNA trinucleotides. Besides the most basic structure representation, the Cartesian coordinates representation, several other structure representations are used.

Publication types

  • Comparative Study

MeSH terms

  • Cluster Analysis
  • Computer Simulation
  • Databases, Nucleic Acid*
  • Molecular Structure
  • Oligoribonucleotides / chemistry
  • RNA / chemistry*

Substances

  • Oligoribonucleotides
  • RNA