Practical Guide to Honest Causal Forests for Identifying Heterogeneous Treatment Effects

Am J Epidemiol. 2023 Jul 7;192(7):1155-1165. doi: 10.1093/aje/kwad043.

Abstract

"Heterogeneous treatment effects" is a term which refers to conditional average treatment effects (i.e., CATEs) that vary across population subgroups. Epidemiologists are often interested in estimating such effects because they can help detect populations that may particularly benefit from or be harmed by a treatment. However, standard regression approaches for estimating heterogeneous effects are limited by preexisting hypotheses, test a single effect modifier at a time, and are subject to the multiple-comparisons problem. In this article, we aim to offer a practical guide to honest causal forests, an ensemble tree-based learning method which can discover as well as estimate heterogeneous treatment effects using a data-driven approach. We discuss the fundamentals of tree-based methods, describe how honest causal forests can identify and estimate heterogeneous effects, and demonstrate an implementation of this method using simulated data. Our implementation highlights the steps required to simulate data sets, build honest causal forests, and assess model performance across a variety of simulation scenarios. Overall, this paper is intended for epidemiologists and other population health researchers who lack an extensive background in machine learning yet are interested in utilizing an emerging method for identifying and estimating heterogeneous treatment effects.

Keywords: data science; effect modifiers; epidemiologic methods; honest causal forests; machine learning; precision medicine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Causality
  • Computer Simulation
  • Forests*
  • Humans
  • Machine Learning*