{"title":"CausalBatch","authors":"Lloyd Pellatt, D. Roggen","doi":"10.1145/3410530.3414365","DOIUrl":null,"url":null,"abstract":"Deep neural networks consisting of a combination of convolutional feature extractor layers and Long Short Term Memory (LSTM) recurrent layers are widely used models for activity recognition from wearable sensors ---referred to as DeepConvLSTM architectures hereafter. However, the subtleties of training these models on sequential time series data is not often discussed in the literature. Continuous sensor data must be segmented into temporal 'windows', and fed through the network to produce a loss which is used to update the parameters of the network. If trained naively using batches of randomly selected data as commonly reported, then the temporal horizon (the maximum delay at which input samples can effect the output of the model) of the network is limited to the length of the window. An alternative approach, which we will call CausalBatch training, is to construct batches deliberately such that each consecutive batch contains windows which are contiguous in time with the windows of the previous batch, with only the first batch in the CausalBatch consisting of randomly selected windows. After a given number of consecutive batches (referred to as the CausalBatch duration τ), the LSTM states are reset, new random starting points are chosen from the dataset and a new CausalBatch is started. This approach allows us to increase the temporal horizon of the network without increasing the window size, which enables networks to learn data dependencies on a longer timescale without increasing computational complexity. We evaluate these two approaches on the Opportunity dataset. We find that using the CausalBatch method we can reduce the training time of DeepConvLSTM by up to 90%, while increasing the user-independent accuracy by up to 6.3% and the class weighted F1 score by up to 5.9% compared to the same model trained by random batch training with the best performing choice of window size for the latter. Compared to the same model trained using the same window length, and therefore the same computational complexity and almost identical training time, we observe an 8.4% increase in accuracy and 14.3% increase in weighted F1 score. We provide the source code for all experiments as well as a Pytorch reference implementation of DeepConvLSTM in a public github repository.","PeriodicalId":7183,"journal":{"name":"Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3410530.3414365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Deep neural networks consisting of a combination of convolutional feature extractor layers and Long Short Term Memory (LSTM) recurrent layers are widely used models for activity recognition from wearable sensors ---referred to as DeepConvLSTM architectures hereafter. However, the subtleties of training these models on sequential time series data is not often discussed in the literature. Continuous sensor data must be segmented into temporal 'windows', and fed through the network to produce a loss which is used to update the parameters of the network. If trained naively using batches of randomly selected data as commonly reported, then the temporal horizon (the maximum delay at which input samples can effect the output of the model) of the network is limited to the length of the window. An alternative approach, which we will call CausalBatch training, is to construct batches deliberately such that each consecutive batch contains windows which are contiguous in time with the windows of the previous batch, with only the first batch in the CausalBatch consisting of randomly selected windows. After a given number of consecutive batches (referred to as the CausalBatch duration τ), the LSTM states are reset, new random starting points are chosen from the dataset and a new CausalBatch is started. This approach allows us to increase the temporal horizon of the network without increasing the window size, which enables networks to learn data dependencies on a longer timescale without increasing computational complexity. We evaluate these two approaches on the Opportunity dataset. We find that using the CausalBatch method we can reduce the training time of DeepConvLSTM by up to 90%, while increasing the user-independent accuracy by up to 6.3% and the class weighted F1 score by up to 5.9% compared to the same model trained by random batch training with the best performing choice of window size for the latter. Compared to the same model trained using the same window length, and therefore the same computational complexity and almost identical training time, we observe an 8.4% increase in accuracy and 14.3% increase in weighted F1 score. We provide the source code for all experiments as well as a Pytorch reference implementation of DeepConvLSTM in a public github repository.