Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI:10.1109/ICDMW51313.2020.00122

Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros

{"title":"Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining","authors":"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros","doi":"10.1109/ICDMW51313.2020.00122","DOIUrl":null,"url":null,"abstract":"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.

查看原文本刊更多论文

时间序列大数据流挖掘的Kennard-Stone平衡算法

如今，由于物联网和传感器应用的进步，时间序列的生成相对更容易，数量也比以往任何时候都多。使用这样的大数据流有效地训练预测模型对机器学习提出了一定的挑战。数据采样是预处理中处理超大规模数据的一项重要技术，它将庞大的数据流转换为可管理的、具有代表性的子集，然后将其加载到模型归纳过程中。本文提出了一种新的数据转换方法——Kennard-Stone Balance (KSB)算法。在过去的几十年里，研究人员使用KS将有界数据集划分为交叉验证中训练和测试数据的适当部分。在这个新的建议中，我们将KS扩展到考虑到类分布的轮循来平衡子采样数据。这也是KS首次应用于时间序列，目的是提取大数据流的有意义表示，以提高机器学习模型的性能。初步的仿真结果显示了KBS的优势。本文对其进行了分析、讨论和今后的工作。预计KBS将为数据流挖掘带来一种具有巨大潜力的数据采样新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量