Motion rehabilitation is increasingly required owing to an aging population and suffering of stroke, which means human motion analysis must be valued. Based on the concept mentioned above, a deep-learning-based system is proposed to track human motion based on three-dimensional (3D) images in this work; meanwhile, the features of traditional red green blue (RGB) images, known as two-dimensional (2D) images, were used as a comparison. The results indicate that 3D images have an advantage over 2D images due to the information of spatial relationships, which implies that the proposed system can be a potential technology for human motion analysis applications.