Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

Zhirong Luan; Yujun Lai; Rundong Huang; Shuanghao Bai; Yuedi Zhang; Haoran Zhang; Qian Wang

doi:10.3390/s24051687

Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

Sensors (Basel). 2024 Mar 6;24(5):1687. doi: 10.3390/s24051687.

Authors

Zhirong Luan¹, Yujun Lai¹, Rundong Huang¹, Shuanghao Bai², Yuedi Zhang², Haoran Zhang², Qian Wang¹

Affiliations

¹ School of Electrical Engineering, Xi'an University of Technology, Xi'an 710000, China.
² College of Artificial Intelligence, Xi'an Jiaotong University, Xi'an 710000, China.

Abstract

Large language models have found utility in the domain of robot task planning and task decomposition. Nevertheless, the direct application of these models for instructing robots in task execution is not without its challenges. Limitations arise in handling more intricate tasks, encountering difficulties in effective interaction with the environment, and facing constraints in the practical executability of machine control instructions directly generated by such models. In response to these challenges, this research advocates for the implementation of a multi-layer large language model to augment a robot's proficiency in handling complex tasks. The proposed model facilitates a meticulous layer-by-layer decomposition of tasks through the integration of multiple large language models, with the overarching goal of enhancing the accuracy of task planning. Within the task decomposition process, a visual language model is introduced as a sensor for environment perception. The outcomes of this perception process are subsequently assimilated into the large language model, thereby amalgamating the task objectives with environmental information. This integration, in turn, results in the generation of robot motion planning tailored to the specific characteristics of the current environment. Furthermore, to enhance the executability of task planning outputs from the large language model, a semantic alignment method is introduced. This method aligns task planning descriptions with the functional requirements of robot motion, thereby refining the overall compatibility and coherence of the generated instructions. To validate the efficacy of the proposed approach, an experimental platform is established utilizing an intelligent unmanned vehicle. This platform serves as a means to empirically verify the proficiency of the multi-layer large language model in addressing the intricate challenges associated with both robot task planning and execution.

Keywords: large language models; natural language; robots; semantic alignment method.

Grants and funding

U21A20485/National Natural Science Foundation of China