A Discretization-based Ensemble Learning Method for Classification in High-Speed Data Streams

IEEE International Joint Conference on Neural Network Pub Date : 2019-07-01 DOI:10.1109/IJCNN.2019.8851703

João Roberto Bertini Junior

{"title":"A Discretization-based Ensemble Learning Method for Classification in High-Speed Data Streams","authors":"João Roberto Bertini Junior","doi":"10.1109/IJCNN.2019.8851703","DOIUrl":null,"url":null,"abstract":"Data stream mining has attracted much attention of the machine learning community in the last decade. Motivated by the upcoming issues associated with data stream applications, such as concept drift and the velocity into which data needs to be processed, several methods have been proposed in the literature, most of them resulting from adaptations of traditional algorithms. Such methods are forced to satisfy hard requirements of restricted memory and processing time, while keeping track of the performance at the same time. In the classification context, ensembles are an effective and elegant way to handle this task. And mostly, the bottleneck of processing time and memory of an ensemble relies on the employed base learner and on the ensemble updating policy. This paper addresses both issues by proposing: 1) a fast base learning algorithm, which relies on discretizing every attribute range into disjoint intervals and associating, to each of them, a posterior probability relating it to a class; and 2) a static ensemble that comprises such base learners and handles concept drift without replacing base learners. Results comparing the proposed ensemble method to six ensemble approaches, on artificial and real data streams, showed it yields comparable results but with lower computational time; which makes the proposed ensemble an efficient alternative to high-speed data streams.","PeriodicalId":134599,"journal":{"name":"IEEE International Joint Conference on Neural Network","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Joint Conference on Neural Network","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2019.8851703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Data stream mining has attracted much attention of the machine learning community in the last decade. Motivated by the upcoming issues associated with data stream applications, such as concept drift and the velocity into which data needs to be processed, several methods have been proposed in the literature, most of them resulting from adaptations of traditional algorithms. Such methods are forced to satisfy hard requirements of restricted memory and processing time, while keeping track of the performance at the same time. In the classification context, ensembles are an effective and elegant way to handle this task. And mostly, the bottleneck of processing time and memory of an ensemble relies on the employed base learner and on the ensemble updating policy. This paper addresses both issues by proposing: 1) a fast base learning algorithm, which relies on discretizing every attribute range into disjoint intervals and associating, to each of them, a posterior probability relating it to a class; and 2) a static ensemble that comprises such base learners and handles concept drift without replacing base learners. Results comparing the proposed ensemble method to six ensemble approaches, on artificial and real data streams, showed it yields comparable results but with lower computational time; which makes the proposed ensemble an efficient alternative to high-speed data streams.

查看原文本刊更多论文

高速数据流中基于离散化的集成学习分类方法

在过去的十年中，数据流挖掘引起了机器学习社区的广泛关注。由于即将出现的与数据流应用相关的问题，如概念漂移和数据需要处理的速度，文献中提出了几种方法，其中大多数是对传统算法的改编。这些方法必须满足有限内存和处理时间的硬性要求，同时还要跟踪性能。在分类上下文中，集成是处理此任务的有效而优雅的方法。集成的处理时间和内存瓶颈主要取决于所使用的基础学习器和集成的更新策略。本文通过提出:1)一种快速基础学习算法，该算法依赖于将每个属性范围离散为不相交的区间，并将每个区间与一个类相关的后验概率相关联;2)一个静态集成，它包含这样的基本学习器，并且在不替换基本学习器的情况下处理概念漂移。将该方法与六种集成方法在人工数据流和实际数据流上的结果进行了比较，结果表明，该方法的计算时间较短，但结果可比较;这使得所提出的集成成为高速数据流的有效替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Joint Conference on Neural Network

自引率

0.00%

发文量