Optimal ranking and directional signature classification using the integral strategy of multi-objective optimization-based association rule mining of multi-omics data

Front Bioinform. 2023 Jul 27:3:1182176. doi: 10.3389/fbinf.2023.1182176. eCollection 2023.

Abstract

Introduction: Association rule mining (ARM) is a powerful tool for exploring the informative relationships among multiple items (genes) in any dataset. The main problem of ARM is that it generates many rules containing different rule-informative values, which becomes a challenge for the user to choose the effective rules. In addition, few works have been performed on the integration of multiple biological datasets and variable cutoff values in ARM. Methods: To solve all these problems, in this article, we developed a novel framework MOOVARM (multi-objective optimized variable cutoff-based association rule mining) for multi-omics profiles. Results: In this regard, we identified the positive ideal solution (PIS), which maximized the profit and minimized the loss, and negative ideal solution (NIS), which minimized the profit and maximized the loss for all gene sets (item sets), belonging to each extracted rule. Thereafter, we computed the distance (d +) from PIS and distance (d -) from NIS for each gene set or product. These two distances played an important role in determining the optimized associations among various pairs of genes in the multi-omics dataset. We then globally estimated the relative closeness to PIS for ranking the gene sets. When the relative closeness score of the rule is greater than or equal to the pre-defined threshold value, the rule can be considered a final resultant rule. Moreover, MOOVARM evaluated the relative score of the rule based on the status of all genes instead of individual genes. Conclusions: MOOVARM produced the final rank of the extracted (multi-objective optimized) rules of correlated genes which had better disease classification than the state-of-the-art algorithms on gene signature identification.

Keywords: distance-based dynamic support threshold; empirical Bayes test; multi-objective optimization; multi-objective optimized association rule mining; multi-omics sarcoma data.

Grants and funding

ZZ was partially supported by the Cancer Prevention and Research Institute of Texas (CPRIT RP170668 and RP180734) (to ZZ). Publication costs were funded by ZZ’s Professorship Fund. The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.