Dynamic programming for solving a simulated clinical scenario of sepsis resuscitation

Ann Palliat Med. 2021 Apr;10(4):3715-3725. doi: 10.21037/apm-20-2084. Epub 2021 Feb 23.

Abstract

Background: A major challenge in clinical research is population heterogeneity and we need to consider both historical response and current condition of an individual in considering medical decision making. The idea of precise medicine cannot be fully accounted for in traditional randomized controlled trials. Reinforcement learning (RL) is developing rapidly and has found its way into various fields including clinical medicine in which RL is employed to find an optimal treatment strategy. The key idea of RL is to optimize the treatment policy depending on the current state and previous treatment history, which is consistent with the idea behind dynamic programming (DP). DP is a prototype of RL and can be implemented when the system dynamics can be fully quantified.

Methods: The present article aims to illustrate how to perform DP algorithm in a clinical scenario of Sepsis resuscitation. The state transition dynamics are constructed in the framework of Markov Decision Process. The state space is defined by mean arterial pressure (MAP) and lactate; the action space is comprised of fluid administration and vasopressor. The implementation of policy evaluation, policy improvement and iteration are explained with R code.

Results: the DP algorithm was able to find the optimal treatment policy depending on the current states and previous conditions. The iteration process converged at finite steps. We defined several functions such as nextStep(), policyEval() and policy_iteration() to implement the DP algorithm.

Conclusions: This article illustrates how DP can be used to solve a clinical problem. We show that DP is a potential useful tool to tailor treatment strategy to patients with different conditions/states. Potential audience of the paper are those who are interested in using DP for solving clinical problems with dynamic changing states.

Keywords: Dynamic programming (DP); Markov decision process; reinforcement learning (RL); sepsis.

MeSH terms

  • Algorithms*
  • Clinical Decision-Making
  • Humans
  • Markov Chains
  • Sepsis* / therapy