Novelty Detection and Online Learning for Chunk Data Streams.

IF 20.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2021-07-01 Epub Date: 2021-06-08 DOI:10.1109/TPAMI.2020.2965531

Yi Wang, Yi Ding, Xiangjian He, Xin Fan, Chi Lin, Fengqi Li, Tianzhu Wang, Zhongxuan Luo, Jiebo Luo

{"title":"Novelty Detection and Online Learning for Chunk Data Streams.","authors":"Yi Wang, Yi Ding, Xiangjian He, Xin Fan, Chi Lin, Fengqi Li, Tianzhu Wang, Zhongxuan Luo, Jiebo Luo","doi":"10.1109/TPAMI.2020.2965531","DOIUrl":null,"url":null,"abstract":"<p><p>Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-the-art approaches.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 7","pages":"2400-2412"},"PeriodicalIF":20.8000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2020.2965531","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TPAMI.2020.2965531","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/6/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 6

Abstract

Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-the-art approaches.

查看原文本刊更多论文

块数据流的新颖性检测与在线学习。

数据流分析的目的是从连续输入的样本中提取判别信息进行分类。在有效和稳定地增量更新模型的同时检测新数据是极具挑战性的，特别是对于高维和/或大规模数据流。本文提出了一种有效的无标记块数据流新颖性检测和增量学习框架。首先，通过求解一个线性系统的核空间，提出了一种精确的不分解核判别分析(FKDA-X)。FKDA-X产生了一个再现核希尔伯特空间(RKHS)，在该空间中，未标记的块数据可以在具有确定性分类边界的单个决策模型中被多个已知类检测和分类。在FKDA-X的基础上，提出了两种优化方法FKDA-CX和FKDA-C。FKDA-CX利用原始数据的微聚类中心作为输入，实现了优异的新颖性检测性能。使用原始数据的类中心作为输入的FKDA-C和增量FKDA-C (IFKDA-C)在在线学习中具有极快的速度。对低采样和大规模真实世界数据集的理论分析和实验验证表明，所提出的算法可以学习未标记的块数据流，其计算成本显著降低，精度与最先进的方法相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.