A Machine Learning Framework for Balancing Training Sets of Sensor Sequential Data Streams

Sensors (Basel). 2021 Oct 18;21(20):6892. doi: 10.3390/s21206892.

Abstract

The recent explosive growth in the number of smart technologies relying on data collected from sensors and processed with machine learning classifiers made the training data imbalance problem more visible than ever before. Class-imbalanced sets used to train models of various events of interest are among the main reasons for a smart technology to work incorrectly or even to completely fail. This paper presents an attempt to resolve the imbalance problem in sensor sequential (time-series) data through training data augmentation. An Unrolled Generative Adversarial Networks (Unrolled GAN)-powered framework is developed and successfully used to balance the training data of smartphone accelerometer and gyroscope sensors in different contexts of road surface monitoring. Experiments with other sensor data from an open data collection are also conducted. It is demonstrated that the proposed approach allows for improving the classification performance in the case of heavily imbalanced data (the F1 score increased from 0.69 to 0.72, p<0.01, in the presented case study). However, the effect is negligible in the case of slightly imbalanced or inadequate training sets. The latter determines the limitations of this study that would be resolved in future work aimed at incorporating mechanisms for assessing the training data quality into the proposed framework and improving its computational efficiency.

Keywords: Unrolled GAN; class-imbalanced data; sensor sequential data.

MeSH terms

  • Data Accuracy*
  • Data Collection
  • Machine Learning*