Zilean: A modularized framework for large-scale temporal concept drift type classification

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-03-28 DOI:10.1016/j.ins.2025.122134

Zhao Deng , QuanXi Feng , Bin Lin , Gary G. Yen

{"title":"Zilean: A modularized framework for large-scale temporal concept drift type classification","authors":"Zhao Deng , QuanXi Feng , Bin Lin , Gary G. Yen","doi":"10.1016/j.ins.2025.122134","DOIUrl":null,"url":null,"abstract":"<div><div>In the analysis of time series data, particularly in real-world applications, concept drift classification is crucial for enabling models to adapt in a differentiated manner to future data. To address the challenge of identifying diverse types of drift, we propose Zilean, a novel framework that integrates feature-based and predictor-based techniques while accounting for drift residues and fragmentation during repeated drift detection. The framework incorporates the pre-trained BERT-Base language model into its classifier design, leveraging deep learning for automatic drift classification and eliminating the need for judgment curve analysis. To evaluate its performance, experiments were conducted on a variety of real-world and synthetic datasets, each exhibiting different types of concept drift. The results show that on real-world datasets, our framework achieves a classification accuracy of 91.03%, outperforming XGBoost by 7.94% and surpassing TCN-CNN by 4.28%. Additionally, experiments exploring a frozen parameter strategy and the use of a more lightweight language model, DistilBERT, further enhance accuracy to 96.93% and 97.17%, respectively. These findings underscore the framework's effectiveness in large-scale temporal concept drift classification.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"712 ","pages":"Article 122134"},"PeriodicalIF":8.1000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002002552500266X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the analysis of time series data, particularly in real-world applications, concept drift classification is crucial for enabling models to adapt in a differentiated manner to future data. To address the challenge of identifying diverse types of drift, we propose Zilean, a novel framework that integrates feature-based and predictor-based techniques while accounting for drift residues and fragmentation during repeated drift detection. The framework incorporates the pre-trained BERT-Base language model into its classifier design, leveraging deep learning for automatic drift classification and eliminating the need for judgment curve analysis. To evaluate its performance, experiments were conducted on a variety of real-world and synthetic datasets, each exhibiting different types of concept drift. The results show that on real-world datasets, our framework achieves a classification accuracy of 91.03%, outperforming XGBoost by 7.94% and surpassing TCN-CNN by 4.28%. Additionally, experiments exploring a frozen parameter strategy and the use of a more lightweight language model, DistilBERT, further enhance accuracy to 96.93% and 97.17%, respectively. These findings underscore the framework's effectiveness in large-scale temporal concept drift classification.

Abstract Image

查看原文本刊更多论文

Zilean：大规模时间概念漂移类型分类的模块化框架

在时间序列数据的分析中，特别是在实际应用中，概念漂移分类对于使模型能够以不同的方式适应未来的数据至关重要。为了解决识别不同类型漂移的挑战，我们提出了Zilean，这是一个集成了基于特征和基于预测器的技术的新框架，同时考虑了重复漂移检测过程中的漂移残留和碎片化。该框架将预训练的BERT-Base语言模型整合到其分类器设计中，利用深度学习进行自动漂移分类，消除了对判断曲线分析的需要。为了评估其性能，在各种真实世界和合成数据集上进行了实验，每个数据集都表现出不同类型的概念漂移。结果表明，在真实数据集上，我们的框架实现了91.03%的分类准确率，比XGBoost高7.94%，比TCN-CNN高4.28%。此外，实验探索了冻结参数策略和使用更轻量级的语言模型蒸馏器，进一步提高了准确率，分别达到96.93%和97.17%。这些发现强调了该框架在大尺度时间概念漂移分类中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.