Incremental Data Drifting: Evaluation Metrics, Data Generation, and Approach Comparison

IF 7.2 4区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yu-Tung Pai, Nien-En Sun, Cheng-Te Li, Shou-de Lin
{"title":"Incremental Data Drifting: Evaluation Metrics, Data Generation, and Approach Comparison","authors":"Yu-Tung Pai, Nien-En Sun, Cheng-Te Li, Shou-de Lin","doi":"10.1145/3655630","DOIUrl":null,"url":null,"abstract":"Incremental data drifting is a common problem when employing a machine-learning model in industrial applications. The underlying data distribution evolves gradually, e.g., users change their buying preferences on an E-commerce website over time. The problem needs to be addressed to obtain high performance. Right now, studies regarding incremental data drifting suffer from several issues. For one thing, there is a lack of clear-defined incremental drift datasets for examination. Existing efforts use either collected real datasets or synthetic datasets that show two obvious limitations. One is in particular when and of which type of drifts the distribution undergoes is unknown, and the other is that a simple synthesized dataset cannot reflect the complex representation we would normally face in the real world. For another, there lacks of a well-defined protocol to evaluate a learner’s knowledge transfer capability on an incremental drift dataset. To provide a holistic discussion on these issues, we create approaches to generate datasets with specific drift types, and define a novel protocol for evaluation. Besides, we investigate recent advances in the transfer learning field, including Domain Adaptation and Lifelong Learning, and examine how they perform in the presence of incremental data drifting. The results unfold the relationships among drift types, knowledge preservation, and learning approaches.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3655630","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Incremental data drifting is a common problem when employing a machine-learning model in industrial applications. The underlying data distribution evolves gradually, e.g., users change their buying preferences on an E-commerce website over time. The problem needs to be addressed to obtain high performance. Right now, studies regarding incremental data drifting suffer from several issues. For one thing, there is a lack of clear-defined incremental drift datasets for examination. Existing efforts use either collected real datasets or synthetic datasets that show two obvious limitations. One is in particular when and of which type of drifts the distribution undergoes is unknown, and the other is that a simple synthesized dataset cannot reflect the complex representation we would normally face in the real world. For another, there lacks of a well-defined protocol to evaluate a learner’s knowledge transfer capability on an incremental drift dataset. To provide a holistic discussion on these issues, we create approaches to generate datasets with specific drift types, and define a novel protocol for evaluation. Besides, we investigate recent advances in the transfer learning field, including Domain Adaptation and Lifelong Learning, and examine how they perform in the presence of incremental data drifting. The results unfold the relationships among drift types, knowledge preservation, and learning approaches.
增量数据漂移:评估指标、数据生成和方法比较
在工业应用中使用机器学习模型时,增量数据漂移是一个常见问题。底层数据分布会逐渐变化,例如,用户在电子商务网站上的购买偏好会随着时间的推移而改变。要想获得高性能,就必须解决这个问题。目前,有关增量数据漂移的研究存在几个问题。首先,缺乏明确定义的增量漂移数据集来进行研究。现有的研究要么使用收集的真实数据集,要么使用合成数据集,这两种数据集都有两个明显的局限性。一个是分布何时发生漂移以及发生哪种漂移尚不可知,另一个是简单的合成数据集无法反映我们在现实世界中通常会遇到的复杂情况。此外,还缺乏一个定义明确的协议来评估学习者在增量漂移数据集上的知识迁移能力。为了对这些问题进行全面讨论,我们创建了生成特定漂移类型数据集的方法,并定义了新颖的评估协议。此外,我们还研究了迁移学习领域的最新进展,包括领域适应和终身学习,并考察了它们在增量数据漂移情况下的表现。研究结果揭示了漂移类型、知识保存和学习方法之间的关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.30
自引率
2.00%
发文量
131
期刊介绍: ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信