Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

Chujun Liu; Andrew G Lonsberry; Mark J Nandor; Musa L Audu; Alexander J Lonsberry; Roger D Quinn

doi:10.3390/biomimetics4010028

Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

Biomimetics (Basel). 2019 Mar 22;4(1):28. doi: 10.3390/biomimetics4010028.

Authors

Chujun Liu¹, Andrew G Lonsberry², Mark J Nandor³, Musa L Audu⁴, Alexander J Lonsberry⁵, Roger D Quinn⁶

Affiliations

¹ Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. cxl936@case.edu.
² Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. agl10@case.edu.
³ Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. mjn18@case.edu.
⁴ Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. mxa93@case.edu.
⁵ Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. ajl17@case.edu.
⁶ Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA. rdq@case.edu.

Abstract

A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4-2.3 m tall with no changes.

Keywords: DDPG neural network; biped; gait; stability.

Grants and funding

1739800/National Science Foundation