Sampling approaches to reduce very frequent seasonal time series

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2024-07-26 DOI:10.1111/exsy.13690

Afonso Baldo, Paulo J. S. Ferreira, João Mendes‐Moreira

{"title":"Sampling approaches to reduce very frequent seasonal time series","authors":"Afonso Baldo, Paulo J. S. Ferreira, João Mendes‐Moreira","doi":"10.1111/exsy.13690","DOIUrl":null,"url":null,"abstract":"With technological advancements, much data is being captured by sensors, smartphones, wearable devices, and so forth. These vast datasets are stored in data centres and utilized to forge data‐driven models for the condition monitoring of infrastructures and systems through future data mining tasks. However, these datasets often surpass the processing capabilities of traditional information systems and methodologies due to their significant size. Additionally, not all samples within these datasets contribute valuable information during the model training phase, leading to inefficiencies. The processing and training of Machine Learning algorithms become time‐consuming, and storing all the data demands excessive space, contributing to the Big Data challenge. In this paper, we propose two novel techniques to reduce large time‐series datasets into more compact versions without undermining the predictive performance of the resulting models. These methods also aim to decrease the time required for training the models and the storage space needed for the condensed datasets. We evaluated our techniques on five public datasets, employing three Machine Learning algorithms: Holt‐Winters, SARIMA, and LSTM. The outcomes indicate that for most of the datasets examined, our techniques maintain, and in several instances enhance, the forecasting accuracy of the models. Moreover, we significantly reduced the time required to train the Machine Learning algorithms employed.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"168 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1111/exsy.13690","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With technological advancements, much data is being captured by sensors, smartphones, wearable devices, and so forth. These vast datasets are stored in data centres and utilized to forge data‐driven models for the condition monitoring of infrastructures and systems through future data mining tasks. However, these datasets often surpass the processing capabilities of traditional information systems and methodologies due to their significant size. Additionally, not all samples within these datasets contribute valuable information during the model training phase, leading to inefficiencies. The processing and training of Machine Learning algorithms become time‐consuming, and storing all the data demands excessive space, contributing to the Big Data challenge. In this paper, we propose two novel techniques to reduce large time‐series datasets into more compact versions without undermining the predictive performance of the resulting models. These methods also aim to decrease the time required for training the models and the storage space needed for the condensed datasets. We evaluated our techniques on five public datasets, employing three Machine Learning algorithms: Holt‐Winters, SARIMA, and LSTM. The outcomes indicate that for most of the datasets examined, our techniques maintain, and in several instances enhance, the forecasting accuracy of the models. Moreover, we significantly reduced the time required to train the Machine Learning algorithms employed.

查看原文本刊更多论文

减少非常频繁的季节性时间序列的抽样方法

随着技术的进步，传感器、智能手机、可穿戴设备等正在采集大量数据。这些庞大的数据集被存储在数据中心，并通过未来的数据挖掘任务用于建立数据驱动模型，以监测基础设施和系统的状况。然而，由于这些数据集规模庞大，往往超出了传统信息系统和方法的处理能力。此外，在模型训练阶段，这些数据集中并非所有样本都能提供有价值的信息，从而导致效率低下。机器学习算法的处理和训练变得非常耗时，而存储所有数据又需要过大的空间，这就加剧了大数据的挑战。在本文中，我们提出了两种新技术，在不影响模型预测性能的前提下，将大型时间序列数据集缩减为更紧凑的版本。这些方法还旨在减少训练模型所需的时间和压缩数据集所需的存储空间。我们采用 Holt-Winters、SARIMA 和 LSTM 三种机器学习算法，在五个公共数据集上对我们的技术进行了评估。结果表明，对于大多数受检数据集，我们的技术都保持了模型的预测准确性，并在一些情况下提高了预测准确性。此外，我们还大大缩短了训练所采用的机器学习算法所需的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.