A Data Cleaning Method for Big Trace Data Using Movement Consistency

Sensors (Basel). 2018 Mar 9;18(3):824. doi: 10.3390/s18030824.

Abstract

Given the popularization of GPS technologies, the massive amount of spatiotemporal GPS traces collected by vehicles are becoming a new kind of big data source for urban geographic information extraction. The growing volume of the dataset, however, creates processing and management difficulties, while the low quality generates uncertainties when investigating human activities. Based on the conception of the error distribution law and position accuracy of the GPS data, we propose in this paper a data cleaning method for this kind of spatial big data using movement consistency. First, a trajectory is partitioned into a set of sub-trajectories using the movement characteristic points. In this process, GPS points indicate that the motion status of the vehicle has transformed from one state into another, and are regarded as the movement characteristic points. Then, GPS data are cleaned based on the similarities of GPS points and the movement consistency model of the sub-trajectory. The movement consistency model is built using the random sample consensus algorithm based on the high spatial consistency of high-quality GPS data. The proposed method is evaluated based on extensive experiments, using GPS trajectories generated by a sample of vehicles over a 7-day period in Wuhan city, China. The results show the effectiveness and efficiency of the proposed method.

Keywords: big data; data cleaning; movement consistency modeling; vehicle trajectory.