Dynamic programming for solving a simulated clinical scenario of sepsis resuscitation

Zhongheng Zhang; Xiaodian Zhang; Shenhong Gu; Xiaoqing Xu; Wei Jiang; Chuanzhu Lv; Shaojiang Zheng

doi:10.21037/apm-20-2084

Dynamic programming for solving a simulated clinical scenario of sepsis resuscitation

Ann Palliat Med. 2021 Apr;10(4):3715-3725. doi: 10.21037/apm-20-2084. Epub 2021 Feb 23.

Authors

Zhongheng Zhang¹, Xiaodian Zhang², Shenhong Gu², Xiaoqing Xu², Wei Jiang², Chuanzhu Lv², Shaojiang Zheng²

Affiliations

¹ Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China; Key Laboratory of Emergency and Trauma of Ministry of Education, Hainan Provincial Key Laboratory for Tropical Cardiovascular Diseases Research, The First Affiliated Hospital of Hainan Medical University, Research Unit of Island Emergency Medicine of Chinese Academy of Medical Sciences, Hainan Medical University, Haikou, China.
² Key Laboratory of Emergency and Trauma of Ministry of Education, Hainan Provincial Key Laboratory for Tropical Cardiovascular Diseases Research, The First Affiliated Hospital of Hainan Medical University, Research Unit of Island Emergency Medicine of Chinese Academy of Medical Sciences, Hainan Medical University, Haikou, China.

PMID: 33691453
DOI: 10.21037/apm-20-2084

Abstract

Background: A major challenge in clinical research is population heterogeneity and we need to consider both historical response and current condition of an individual in considering medical decision making. The idea of precise medicine cannot be fully accounted for in traditional randomized controlled trials. Reinforcement learning (RL) is developing rapidly and has found its way into various fields including clinical medicine in which RL is employed to find an optimal treatment strategy. The key idea of RL is to optimize the treatment policy depending on the current state and previous treatment history, which is consistent with the idea behind dynamic programming (DP). DP is a prototype of RL and can be implemented when the system dynamics can be fully quantified.

Methods: The present article aims to illustrate how to perform DP algorithm in a clinical scenario of Sepsis resuscitation. The state transition dynamics are constructed in the framework of Markov Decision Process. The state space is defined by mean arterial pressure (MAP) and lactate; the action space is comprised of fluid administration and vasopressor. The implementation of policy evaluation, policy improvement and iteration are explained with R code.

Results: the DP algorithm was able to find the optimal treatment policy depending on the current states and previous conditions. The iteration process converged at finite steps. We defined several functions such as nextStep(), policyEval() and policy_iteration() to implement the DP algorithm.

Conclusions: This article illustrates how DP can be used to solve a clinical problem. We show that DP is a potential useful tool to tailor treatment strategy to patients with different conditions/states. Potential audience of the paper are those who are interested in using DP for solving clinical problems with dynamic changing states.

Keywords: Dynamic programming (DP); Markov decision process; reinforcement learning (RL); sepsis.

MeSH terms

Algorithms*
Clinical Decision-Making
Humans
Markov Chains
Sepsis* / therapy