时间序列大数据流挖掘的Kennard-Stone平衡算法

Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros
{"title":"时间序列大数据流挖掘的Kennard-Stone平衡算法","authors":"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros","doi":"10.1109/ICDMW51313.2020.00122","DOIUrl":null,"url":null,"abstract":"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining\",\"authors\":\"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros\",\"doi\":\"10.1109/ICDMW51313.2020.00122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.\",\"PeriodicalId\":426846,\"journal\":{\"name\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"28 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW51313.2020.00122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

如今,由于物联网和传感器应用的进步,时间序列的生成相对更容易,数量也比以往任何时候都多。使用这样的大数据流有效地训练预测模型对机器学习提出了一定的挑战。数据采样是预处理中处理超大规模数据的一项重要技术,它将庞大的数据流转换为可管理的、具有代表性的子集,然后将其加载到模型归纳过程中。本文提出了一种新的数据转换方法——Kennard-Stone Balance (KSB)算法。在过去的几十年里,研究人员使用KS将有界数据集划分为交叉验证中训练和测试数据的适当部分。在这个新的建议中,我们将KS扩展到考虑到类分布的轮循来平衡子采样数据。这也是KS首次应用于时间序列,目的是提取大数据流的有意义表示,以提高机器学习模型的性能。初步的仿真结果显示了KBS的优势。本文对其进行了分析、讨论和今后的工作。预计KBS将为数据流挖掘带来一种具有巨大潜力的数据采样新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining
Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信