DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles

IEEE Trans Nanobioscience. 2018 Apr;17(2):117-125. doi: 10.1109/TNB.2018.2803021.

Abstract

Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Computational Biology / methods*
  • DNA Methylation
  • Data Mining / methods*
  • Databases, Protein
  • Gene Expression Profiling / methods*
  • Humans
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism

Substances

  • Proteins