Header

UZH-Logo

Maintenance Infos

A Machine Learning Framework for Balancing Training Sets of Sensor Sequential Data Streams


Setiawan, Budi Darma; Serdült, Uwe; Kryssanov, Victor (2021). A Machine Learning Framework for Balancing Training Sets of Sensor Sequential Data Streams. Sensors, 21(20):6892.

Abstract

The recent explosive growth in the number of smart technologies relying on data collected from sensors and processed with machine learning classifiers made the training data imbalance problem more visible than ever before. Class-imbalanced sets used to train models of various events of interest are among the main reasons for a smart technology to work incorrectly or even to completely fail. This paper presents an attempt to resolve the imbalance problem in sensor sequential (time-series) data through training data augmentation. An Unrolled Generative Adversarial Networks (Unrolled GAN)-powered framework is developed and successfully used to balance the training data of smartphone accelerometer and gyroscope sensors in different contexts of road surface monitoring. Experiments with other sensor data from an open data collection are also conducted. It is demonstrated that the proposed approach allows for improving the classification performance in the case of heavily imbalanced data (the F1 score increased from 0.69 to 0.72, p<0.01, in the presented case study). However, the effect is negligible in the case of slightly imbalanced or inadequate training sets. The latter determines the limitations of this study that would be resolved in future work aimed at incorporating mechanisms for assessing the training data quality into the proposed framework and improving its computational efficiency.

Abstract

The recent explosive growth in the number of smart technologies relying on data collected from sensors and processed with machine learning classifiers made the training data imbalance problem more visible than ever before. Class-imbalanced sets used to train models of various events of interest are among the main reasons for a smart technology to work incorrectly or even to completely fail. This paper presents an attempt to resolve the imbalance problem in sensor sequential (time-series) data through training data augmentation. An Unrolled Generative Adversarial Networks (Unrolled GAN)-powered framework is developed and successfully used to balance the training data of smartphone accelerometer and gyroscope sensors in different contexts of road surface monitoring. Experiments with other sensor data from an open data collection are also conducted. It is demonstrated that the proposed approach allows for improving the classification performance in the case of heavily imbalanced data (the F1 score increased from 0.69 to 0.72, p<0.01, in the presented case study). However, the effect is negligible in the case of slightly imbalanced or inadequate training sets. The latter determines the limitations of this study that would be resolved in future work aimed at incorporating mechanisms for assessing the training data quality into the proposed framework and improving its computational efficiency.

Statistics

Citations

Dimensions.ai Metrics
4 citations in Web of Science®
5 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

8 downloads since deposited on 17 Nov 2021
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:340 Law
Scopus Subject Areas:Physical Sciences > Analytical Chemistry
Physical Sciences > Information Systems
Physical Sciences > Atomic and Molecular Physics, and Optics
Life Sciences > Biochemistry
Physical Sciences > Instrumentation
Physical Sciences > Electrical and Electronic Engineering
Uncontrolled Keywords:Electrical and Electronic Engineering, Biochemistry, Instrumentation, Atomic and Molecular Physics, and Optics, Analytical Chemistry
Language:English
Date:18 October 2021
Deposited On:17 Nov 2021 14:01
Last Modified:25 Feb 2024 02:48
Publisher:MDPI Publishing
ISSN:1424-8220
OA Status:Gold
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.3390/s21206892
PubMed ID:34696105
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)