CauRuler: Causal irredundant association rule miner for complex patient trajectory modelling

Comput Biol Med. 2023 Mar:155:106636. doi: 10.1016/j.compbiomed.2023.106636. Epub 2023 Feb 9.

Abstract

Background and objectives: Discovering causal associations between variables is one of the main goals of clinical trials, with the ultimate aim of identifying the causes of specific health status. Prior knowledge of causal paths could help ensure patients do not develop the resultant conditions. In recent years, thanks to the enormous amount of health data stored with the support of digital tools, attempts have been made to employ Machine Learning to infer causality. Those methodologies suffer from some deficiencies in controlling cofounders when analysing causality, as well as providing causal rules general enough to be useful in healthcare practice. Conversely, this work presents and evaluates CauRuler, a new approach to deal with causality from association rules. The proposed approach uses a pruning strategy to reduce the association rule set, which does not compromise the causality learning capability of the algorithm. This behaviour makes the algorithm suitable for exploiting large health databases with thousands of patients and medical instances. CauRuler can control a larger number of confounders than other proposals, bringing robustness to causal analysis and avoiding the identification of spurious associations. Additionally, the method generalizes causality using anti-monotone properties to obtain complex and general causal paths. The method can target correct causal associations in complex medical databases with retrospective data.

Method: CauRuler extends association rule mining with an irredundancy property so that the set of rules learnt is reduced in size and generalized. General association rules, conformed by fewer items, enable controlling more confounding variables to verify, with more statistical evidence on available data, if they represent causal paths in patient disease trajectories.

Results: CauRuler has been tested on a complex real medical database (3,5 M visits to the primary care services between 2019 and 2020, and controlling over 15.000 different variables including diagnoses and demographic and other clinical patient data). The reduction of the rule set achieved by the pruning strategy goes from 7.732 to 2.240 rules, from which 46 have been found to have causality relationships in the patient trajectories, and generalized to 14 rules tested as true causal relationships thanks to the confounding analysis. These rules have been validated by clinicians with the support of a graphical map. The obtained causal paths control in average of 906 confounder variables, retrieving robust results.

Conclusions: Causal relationships enable predicting causal paths between health conditions according to patient trajectories. Knowing these causal paths is crucial for understanding and preventing the appearance or worsening of diseases in patients. CauRuler, with high demanding thresholds, has proven its efficiency and effectiveness in targeting previously known causal associations between diagnoses, reaching consensus in the medical community. Softening these thresholds should help target interesting general causal paths.

Keywords: Association rules; Causal inference; Health data mining; Non-redundant associations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Humans
  • Machine Learning*
  • Retrospective Studies