利用漂移类型适应概念漂移

IF 4.8 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-01-02 DOI:10.1145/3638777

Jinpeng Li, Hang Yu, zhenyuzhang, Xiangfeng Luo, Shaorong Xie

{"title":"利用漂移类型适应概念漂移","authors":"Jinpeng Li, Hang Yu, zhenyuzhang, Xiangfeng Luo, Shaorong Xie","doi":"10.1145/3638777","DOIUrl":null,"url":null,"abstract":"<p>Concept drift is a phenomenon where the distribution of data streams changes over time. When this happens, model predictions become less accurate. Hence, models built in the past need to be re-learned for the current data. Two design questions need to be addressed in designing a strategy to re-learn models: which type of concept drift has occurred, and how to utilize the drift type to improve re-learning performance. Existing drift detection methods are often good at determining when drift has occurred. However, few retrieve information about how the drift came to be present in the stream. Hence, determining the impact of the type of drift on adaptation is difficult. Filling this gap, we designed a framework based on a lazy strategy called Type-Driven Lazy Drift Adaptor (Type-LDA). Type-LDA first retrieves information about both how and when a drift has occurred, then it uses this information to re-learn the new model. To identify the type of drift, a drift type identifier is pre-trained on synthetic data of known drift types. Further, a drift point locator locates the optimal point of drift via a sharing loss. Hence, Type-LDA can select the optimal point, according to the drift type, to re-learn the new model. Experiments validate Type-LDA on both synthetic data and real-world data, and the results show that accurately identifying drift type can improve adaptation accuracy.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"23 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concept Drift Adaptation by Exploiting Drift Type\",\"authors\":\"Jinpeng Li, Hang Yu, zhenyuzhang, Xiangfeng Luo, Shaorong Xie\",\"doi\":\"10.1145/3638777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Concept drift is a phenomenon where the distribution of data streams changes over time. When this happens, model predictions become less accurate. Hence, models built in the past need to be re-learned for the current data. Two design questions need to be addressed in designing a strategy to re-learn models: which type of concept drift has occurred, and how to utilize the drift type to improve re-learning performance. Existing drift detection methods are often good at determining when drift has occurred. However, few retrieve information about how the drift came to be present in the stream. Hence, determining the impact of the type of drift on adaptation is difficult. Filling this gap, we designed a framework based on a lazy strategy called Type-Driven Lazy Drift Adaptor (Type-LDA). Type-LDA first retrieves information about both how and when a drift has occurred, then it uses this information to re-learn the new model. To identify the type of drift, a drift type identifier is pre-trained on synthetic data of known drift types. Further, a drift point locator locates the optimal point of drift via a sharing loss. Hence, Type-LDA can select the optimal point, according to the drift type, to re-learn the new model. Experiments validate Type-LDA on both synthetic data and real-world data, and the results show that accurately identifying drift type can improve adaptation accuracy.</p>\",\"PeriodicalId\":49249,\"journal\":{\"name\":\"ACM Transactions on Knowledge Discovery from Data\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Knowledge Discovery from Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3638777\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638777","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

概念漂移是指数据流的分布随时间发生变化的现象。当这种情况发生时，模型预测的准确性就会降低。因此，需要针对当前数据重新学习过去建立的模型。在设计重新学习模型的策略时，需要解决两个设计问题：发生了哪种类型的概念漂移，以及如何利用漂移类型来提高重新学习性能。现有的漂移检测方法通常都能很好地确定漂移发生的时间。但是，很少有方法能检索到漂移是如何出现在数据流中的信息。因此，很难确定漂移类型对适应性的影响。为了填补这一空白，我们设计了一个基于懒惰策略的框架，称为类型驱动的懒惰漂移适配器（Type-LDA）。Type-LDA 首先检索漂移发生的方式和时间，然后利用这些信息重新学习新模型。为了识别漂移类型，漂移类型识别器会在已知漂移类型的合成数据上进行预训练。此外，漂移点定位器通过共享损失来定位最佳漂移点。因此，Type-LDA 可以根据漂移类型选择最佳点来重新学习新模型。实验在合成数据和真实世界数据上验证了 Type-LDA，结果表明，准确识别漂移类型可以提高适应精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Concept Drift Adaptation by Exploiting Drift Type

Concept drift is a phenomenon where the distribution of data streams changes over time. When this happens, model predictions become less accurate. Hence, models built in the past need to be re-learned for the current data. Two design questions need to be addressed in designing a strategy to re-learn models: which type of concept drift has occurred, and how to utilize the drift type to improve re-learning performance. Existing drift detection methods are often good at determining when drift has occurred. However, few retrieve information about how the drift came to be present in the stream. Hence, determining the impact of the type of drift on adaptation is difficult. Filling this gap, we designed a framework based on a lazy strategy called Type-Driven Lazy Drift Adaptor (Type-LDA). Type-LDA first retrieves information about both how and when a drift has occurred, then it uses this information to re-learn the new model. To identify the type of drift, a drift type identifier is pre-trained on synthetic data of known drift types. Further, a drift point locator locates the optimal point of drift via a sharing loss. Hence, Type-LDA can select the optimal point, according to the drift type, to re-learn the new model. Experiments validate Type-LDA on both synthetic data and real-world data, and the results show that accurately identifying drift type can improve adaptation accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.