Transportation Mode Detection Combining CNN and Vision Transformer with Sensors Recalibration Using Smartphone Built-In Sensors

Ye Tian; Dulmini Hettiarachchi; Shunsuke Kamijo

doi:10.3390/s22176453

Transportation Mode Detection Combining CNN and Vision Transformer with Sensors Recalibration Using Smartphone Built-In Sensors

Sensors (Basel). 2022 Aug 26;22(17):6453. doi: 10.3390/s22176453.

Authors

Ye Tian¹, Dulmini Hettiarachchi¹, Shunsuke Kamijo²

Affiliations

¹ Graduate School of Interdisciplinary Information Studies (GSII), The University of Tokyo, 4 Chome-6-1 Komaba, Meguro City, Tokyo 153-0041, Japan.
² The Institute of Industrial Science (IIS), The University of Tokyo, 4 Chome-6-1 Komaba, Meguro City, Tokyo 153-0041, Japan.

Abstract

Transportation Mode Detection (TMD) is an important task for the Intelligent Transportation System (ITS) and Lifelog. TMD, using smartphone built-in sensors, can be a low-cost and effective solution. In recent years, many studies have focused on TMD, yet they support a limited number of modes and do not consider similar transportation modes and holding places, limiting further applications. In this paper, we propose a new network framework to realize TMD, which combines structural and spatial interaction features, and considers the weights of multiple sensors' contributions, enabling the recognition of eight transportation modes with four similar transportation modes and four holding places. First, raw data is segmented and transformed into a spectrum image and then ResNet and Vision Transformers (Vit) are used to extract structural and spatial interaction features, respectively. To consider the contribution of different sensors, the weights of each sensor are recalibrated using an ECA module. Finally, Multi-Layer Perceptron (MLP) is introduced to fuse these two different kinds of features. The performance of the proposed method is evaluated on the public Sussex-Huawei Locomotion-Transportation (SHL) dataset, and is found to outperform the baselines by at least 10%.

Keywords: CNN; lifelog; sensors recalibration; spectrogram recognition; transportation mode detection; vision transformer.

MeSH terms

Neural Networks, Computer*
Smartphone*
Transportation

Grants and funding

This research received no external funding.