Targeted maximum likelihood estimation for a binary treatment: A tutorial

Miguel Angel Luque-Fernandez; Michael Schomaker; Bernard Rachet; Mireille E Schnitzer

doi:10.1002/sim.7628

Targeted maximum likelihood estimation for a binary treatment: A tutorial

Stat Med. 2018 Jul 20;37(16):2530-2546. doi: 10.1002/sim.7628. Epub 2018 Apr 23.

Authors

Miguel Angel Luque-Fernandez^{1

2

3}, Michael Schomaker⁴, Bernard Rachet¹, Mireille E Schnitzer⁵

Affiliations

¹ Cancer Survival Group, Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK.
² Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
³ Biomedical Research Institute of Granada, Non-Communicable and Cancer Epidemiology Group (ibs.Granada), Andalusian School of Public Health, Granada, Spain.
⁴ School of Public Health and Family Medicine, Center for Infectious Disease Epidemiology and Research, The University of Cape Town, Cape Town, South Africa.
⁵ Faculté de pharmacie, Université de Montréal, Montréal, Canada.

Abstract

When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G-formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double-robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double-robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine-learning methods. It therefore requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial.

Keywords: causal inference; ensemble Learning; machine learning; observational studies; targeted maximum likelihood estimation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computer Simulation
Data Interpretation, Statistical
Epidemiologic Methods*
Humans
Likelihood Functions*
Machine Learning
Neoplasms / epidemiology
Propensity Score

Abstract

Publication types

MeSH terms

Grants and funding