Finding top-k covering irreducible contrast sequence rules for disease diagnosis

Comput Math Methods Med. 2015:2015:353146. doi: 10.1155/2015/353146. Epub 2015 Mar 10.

Abstract

Diagnostic genes are usually used to distinguish different disease phenotypes. Most existing methods for diagnostic genes finding are based on either the individual or combinatorial discriminative power of gene(s). However, they both ignore the common expression trends among genes. In this paper, we devise a novel sequence rule, namely, top-k irreducible covering contrast sequence rules (TopkIRs for short), which helps to build a sample classifier of high accuracy. Furthermore, we propose an algorithm called MineTopkIRs to efficiently discover TopkIRs. Extensive experiments conducted on synthetic and real datasets show that MineTopkIRs is significantly faster than the previous methods and is of a higher classification accuracy. Additionally, many diagnostic genes discovered provide a new insight into disease diagnosis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Databases, Factual
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling
  • Gene Expression Regulation
  • Genetic Predisposition to Disease
  • Humans
  • Oligonucleotide Array Sequence Analysis
  • Phenotype
  • Reproducibility of Results
  • Software