Ruirui Zhao , Yaqian You , Jianbin Sun , João Gama , Jiang Jiang
{"title":"Online learning from drifting capricious data streams with flexible Hoeffding tree","authors":"Ruirui Zhao , Yaqian You , Jianbin Sun , João Gama , Jiang Jiang","doi":"10.1016/j.ipm.2025.104221","DOIUrl":null,"url":null,"abstract":"<div><div>Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104221"},"PeriodicalIF":7.4000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325001621","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.