Model-based reinforcement learning with non-Gaussian environment dynamics and its application to portfolio optimization

Huifang Huang; Ting Gao; Pengbo Li; Jin Guo; Peng Zhang; Nan Du; Jinqiao Duan

doi:10.1063/5.0155574

Model-based reinforcement learning with non-Gaussian environment dynamics and its application to portfolio optimization

Chaos. 2023 Aug 1;33(8):083129. doi: 10.1063/5.0155574.

Authors

Huifang Huang¹, Ting Gao², Pengbo Li², Jin Guo², Peng Zhang², Nan Du³, Jinqiao Duan^{2

4}

Affiliations

¹ School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China.
² Center for Mathematical Sciences, Huazhong University of Science and Technology, Wuhan 430074, China.
³ Tencent AI Lab, Shenzhen 518000, China.
⁴ Department of Mathematics, School of Sciences, Great Bay University, Dongguan 523000, China.

PMID: 37561122
DOI: 10.1063/5.0155574

Abstract

The rapid development of quantitative portfolio optimization in financial engineering has produced promising results in AI-based algorithmic trading strategies. However, the complexity of financial markets poses challenges for comprehensive simulation due to various factors, such as abrupt transitions, unpredictable hidden causal factors, and heavy tail properties. This paper aims to address these challenges by employing heavy-tailed preserving normalizing flows to simulate the high-dimensional joint probability of the complex trading environment under a model-based reinforcement learning framework. Through experiments with various stocks from three financial markets (Dow, NASDAQ, and S&P), we demonstrate that Dow outperforms the other two based on multiple evaluation metrics in our testing system. Notably, our proposed method mitigates the impact of unpredictable financial market crises during the COVID-19 pandemic, resulting in a lower maximum drawdown. Additionally, we explore the explanation of our reinforcement learning algorithm, employing the pattern causality method to study interactive relationships among stocks, analyzing dynamics of training for loss functions to ensure convergence, visualizing high-dimensional state transition data with t-SNE to uncover effective patterns for portfolio optimization, and utilizing eigenvalue analysis to study convergence properties of the environment's model.