{"title":"Shapelets-based Data Augmentation for Time Series Classification","authors":"Peiyu Li, S. F. Boubrahimi, S. M. Hamdi","doi":"10.1109/ICMLA52953.2021.00222","DOIUrl":null,"url":null,"abstract":"Data augmentation is an important data mining task that has been highly adopted to resolve class imbalance problems and provide more input to data-hungry models. For the case of time series data, the data augmentation method needs to take into consideration the dependence of the variables. In this paper, we propose a new model that preserves important relations between variables while performing time series data augmentation. In particular, we combine shapelets transform and Synthetic Minority Oversampling Technique (SMOTE) models to achieve the aforementioned goal. By using shapelets transform, the most prominent shapelets are extracted from the training set and used during the oversampling process. To make the most use of important shapelets, our proposed method preserves the extracted shapelets as the key part in the synthetic data sample. Then for the other parts of each synthetic data sample, we use SMOTE to generate the remaining data points. Compared with pure SMOTE, our method makes full use of important shapelets to maintain the important correlations between interdependent variables, which also can provide more interpretive information.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"74 1","pages":"1373-1378"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Data augmentation is an important data mining task that has been highly adopted to resolve class imbalance problems and provide more input to data-hungry models. For the case of time series data, the data augmentation method needs to take into consideration the dependence of the variables. In this paper, we propose a new model that preserves important relations between variables while performing time series data augmentation. In particular, we combine shapelets transform and Synthetic Minority Oversampling Technique (SMOTE) models to achieve the aforementioned goal. By using shapelets transform, the most prominent shapelets are extracted from the training set and used during the oversampling process. To make the most use of important shapelets, our proposed method preserves the extracted shapelets as the key part in the synthetic data sample. Then for the other parts of each synthetic data sample, we use SMOTE to generate the remaining data points. Compared with pure SMOTE, our method makes full use of important shapelets to maintain the important correlations between interdependent variables, which also can provide more interpretive information.