Genomic comparison using data mining techniques based on a possibilistic fuzzy sets model

Biosystems. 2007 Apr;88(3):343-9. doi: 10.1016/j.biosystems.2006.07.014. Epub 2006 Nov 10.

Abstract

Current copiousness of genomic information stored in biological databases [Mar Albà, M., Lee, M., Pearl, D., Shepherd, F.M.G., Martin, A.J., Orengo, N., Kellam, C.A., 2001. P. VIDA: a virus database system for the organisation of virus genome open reading frames. Nuleic Acids Res. 133-136] makes ultimately feasible the proposal for an application of knowledge management aimed to discover general rules in subcellular phenomena. The goal of this work is primarily to discover relationships between genes by microarray analysis. The tools exploited come from clustering techniques and are mainly based on Knowledge Discovery in Databases (KDD) concepts [Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. From data mining to knowledge discovery in databases. AI Magazine 17(3), 37-54]. Starting from a data set, each element can be represented by a characteristic matrix, which sums up all data attributes. In this case data mining is oriented to perform a Pattern Recognition of related sequences, hidden in databases [Hand, D.J., Nicholas, A., 2005. Heard finding groups in gene expression data. J. Biomed. Biotechnol. 215-225]. Following a bottom up approach, the next refinement is to compare retrieved data to gather similar features, by dedicated clustering algorithms [Kaufman, L., Rousseeuw, P.J., 1990. Finding groups in data. An Introduction to Cluster Analysis. John Wiley & Sons, New York; Forman, G., Zhang, B., 2000. Distributed Data clustering can be efficient and exact HP. Laboratories Palo Alto HPL-2000, p. 158], driven by fuzzy logic, allowing us to perceive by intuition a common denominator for various genomic families and to anticipate likely future developments.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Databases, Genetic*
  • Fuzzy Logic
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Models, Statistical
  • Systems Biology