Humanoid Locomotion and the Brain

Review
In: Humanoid Robotics and Neuroscience: Science, Engineering and Society. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 7.

Excerpt

In this chapter, we introduce a two-layered biologically inspired biped learning system composed of a lower-level central pattern generator (CPG) and an upper-level reinforcement learning (RL) module. In this proposed approach, the lower-level CPG is used to generate nominal periodic walking patterns and also used to synchronize the periodic patterns with sensory inputs in order to construct an attractive limit cycle. On the other hand, the upper-level reinforcement learning module is used to modulate parameters of the CPG in order to improve walking performances with respect to the provided objective function. Figure 7.1 shows the schematic diagram of the biped learning system.

Biological systems seem to have a simpler but more robust locomotion strategy [26] than existing biped walking controllers for humanoid robots (e.g., [2]). For example, Grillner [17] and [39] showed that the cat locomotion system can generate a walking pattern without using higher brain functions. An early study of biologically inspired approaches to bipedal locomotion [45] suggested that the synchronization property of the neural system with periodic sensor inputs plays an important role for robust locomotion control.

After these leading studies, there is growing interest in biologically inspired locomotion control utilizing CPG modeled by coupled neural oscillators [12,14,15,29] or using a phase oscillator model with phase reset methods [36,47]. These studies make use of foot contact information or ground reaction forces in exploiting the entrainment property of the neural or phase oscillator model.

However, to adapt CPG to new environments, the parameters of the CPG need to be modified. To cope with this environmental change, in the proposed approach, we use a mental-simulation–based (or model-based) RL method [10,30,44] in which the learning system only interacts with a mentally simulated model. In this model-based approach, samples to improve parameters of policies can be generated from the simulated model without directly interacting with the real environment. Accordingly, once the environment model is properly identified, policy parameters can be improved without using real biped robots.

On the other hand, the correct robot model and the ground contact model are difficult to identify. Therefore, we consider using a Poincaré map, which has been used to evaluate local stability of periodic patterns, as the mental simulation model and representing the Poincaré map by using a nonparametric function approximation method.

In the following sections, we introduce the two-layered biped learning system in detail. In Section 7.2, an implementation of the CPG model is introduced. Experimental results show that a biped robot model and a real humanoid robot are able to walk successfully without using carefully designed walking patterns. In Section 7.3, the model-based RL approach using Poincaré map as the mentally simulated environment is introduced. Walking patterns generated by the CPG are successfully improved by using the proposed learning approach.

Publication types

  • Review