An ensemble classification approach for handling spatio-temporal drifts in partially labeled data streams

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) Pub Date : 2014-08-01 DOI:10.1109/IRI.2014.7051961

Tegjyot Singh Sethi, M. Kantardzic, Elaheh Arabmakki, Hanqing Hu

{"title":"An ensemble classification approach for handling spatio-temporal drifts in partially labeled data streams","authors":"Tegjyot Singh Sethi, M. Kantardzic, Elaheh Arabmakki, Hanqing Hu","doi":"10.1109/IRI.2014.7051961","DOIUrl":null,"url":null,"abstract":"The classification of streaming data requires learning in an environment where the distribution of the incoming data might change continuously. Stream classification methodologies need to adapt to these changes under limitations of time and memory resources. As such, it is not possible to expect all the samples in the stream to be labeled, as labeling is often time consuming and expensive. In this paper a new ensemble classification approach is proposed, which can handle Spatio-Temporal drifts in streams even when the labeling is limited. The proposed methodology uses a grid density clustering approach to track drifts in the spatial configuration of the data, and maintains a set of classifier models local to each cluster, to track its evolution over time. Structured weighted aggregation of the models across all clusters is performed to produce an overall effective prediction on a new sample. Additionally, a uniform sampling approach amenable to the grid representation of the clusters is proposed, which selects samples to be labeled while preserving the grid density information of the stream. This provides for better selection of representative samples to be labeled, for improved drift detection and handling, while maintaining the labeling budget. Experimental comparison with state of the art drift handling systems shows that the proposed methodology is able to give a high classification performance, with a manageable ensemble size and with only 10% of the samples labeled.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"447 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2014.7051961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The classification of streaming data requires learning in an environment where the distribution of the incoming data might change continuously. Stream classification methodologies need to adapt to these changes under limitations of time and memory resources. As such, it is not possible to expect all the samples in the stream to be labeled, as labeling is often time consuming and expensive. In this paper a new ensemble classification approach is proposed, which can handle Spatio-Temporal drifts in streams even when the labeling is limited. The proposed methodology uses a grid density clustering approach to track drifts in the spatial configuration of the data, and maintains a set of classifier models local to each cluster, to track its evolution over time. Structured weighted aggregation of the models across all clusters is performed to produce an overall effective prediction on a new sample. Additionally, a uniform sampling approach amenable to the grid representation of the clusters is proposed, which selects samples to be labeled while preserving the grid density information of the stream. This provides for better selection of representative samples to be labeled, for improved drift detection and handling, while maintaining the labeling budget. Experimental comparison with state of the art drift handling systems shows that the proposed methodology is able to give a high classification performance, with a manageable ensemble size and with only 10% of the samples labeled.

查看原文本刊更多论文

一种处理部分标记数据流时空漂移的集成分类方法

流数据的分类需要在传入数据的分布可能不断变化的环境中进行学习。流分类方法需要在有限的时间和内存资源下适应这些变化。因此，不可能期望流中的所有样本都被标记，因为标记通常既耗时又昂贵。本文提出了一种新的集成分类方法，该方法可以在标记有限的情况下处理流中的时空漂移。所提出的方法使用网格密度聚类方法来跟踪数据空间配置中的漂移，并维护一组局部分类器模型，以跟踪其随时间的演变。对所有聚类的模型进行结构化加权聚合，以产生对新样本的整体有效预测。此外，提出了一种适用于集群网格表示的统一采样方法，该方法在保留流网格密度信息的同时选择待标记的样本。这样可以更好地选择有代表性的样本进行标记，改进漂移检测和处理，同时保持标记预算。与目前最先进的漂移处理系统的实验比较表明，所提出的方法能够提供高分类性能，具有可管理的集合大小，并且只有10%的样本被标记。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)

自引率

0.00%

发文量