Discovering novelty in sequential patterns: application for analysis of microarray data on Alzheimer disease

Stud Health Technol Inform. 2010;160(Pt 2):1314-8.

Abstract

Analyzing microarrays data is still a great challenge since existing methods produce huge amounts of useless results. We propose a new method called NoDisco for discovering novelties in gene sequences obtained by applying data-mining techniques to microarray data.

Method: We identify popular genes, which are often cited in the literature, and innovative genes, which are linked to the popular genes in the sequences but are not mentioned in the literature. We also identify popular and innovative sequences containing these genes. Biologists can thus select interesting sequences from the two sets and obtain the k-best documents.

Results: We show the efficiency of this method by applying it on real data used to decipher the mechanisms underlying Alzheimer disease.

Conclusion: The first selection of sequences based on popularity and innovation help experts focus on relevant sequences while the top-k documents help them understand the sequences.

MeSH terms

  • Algorithms
  • Alzheimer Disease / genetics*
  • Data Mining / methods
  • Gene Expression Profiling / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*