Online imbalance learning with unpredictable feature evolution and label scarcity

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"Online imbalance learning with unpredictable feature evolution and label scarcity","authors":"","doi":"10.1016/j.neucom.2024.128476","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, online learning with imbalanced data streams has aroused wide concern, which reflects an uneven distribution of different classes in data streams. Existing approaches have conventionally been conducted on stationary feature space and they assume that we can obtain the entire labels of data streams in the case of supervised learning. However, in many real scenarios, e.g., the environment monitoring task, new features flood in, and old features are partially lost during the changing environment as the different lifespans of different sensors. Besides, each instance needs to be labeled by experts, resulting in expensive costs and scarcity of labels. To address the above problems, this paper proposes a novel Online Imbalance learning with unpredictable Feature evolution and Label scarcity (OIFL) algorithm. First, we utilize margin-based online active learning to selectively label valuable instances. After obtaining the labels, we handle imbalanced class distribution by optimizing F-measure and transforming F-measure optimization into a weighted surrogate loss minimization. When data streams arrive with augmented features, we combine the online passive-aggressive algorithm and structural risk minimization to update the classifier in the divided feature space. When data streams arrive with incomplete features, we leverage variance to identify the most informative features following the empirical risk minimization principle and continue to update the existing classifier as before. Finally, we obtain a sparse but reliable learner by the strategy of projecting truncation. We derive theoretical analyses of OIFL. Also, experiments on the synthetic datasets and real-world data streams to validate the effectiveness of our method.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224012475","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, online learning with imbalanced data streams has aroused wide concern, which reflects an uneven distribution of different classes in data streams. Existing approaches have conventionally been conducted on stationary feature space and they assume that we can obtain the entire labels of data streams in the case of supervised learning. However, in many real scenarios, e.g., the environment monitoring task, new features flood in, and old features are partially lost during the changing environment as the different lifespans of different sensors. Besides, each instance needs to be labeled by experts, resulting in expensive costs and scarcity of labels. To address the above problems, this paper proposes a novel Online Imbalance learning with unpredictable Feature evolution and Label scarcity (OIFL) algorithm. First, we utilize margin-based online active learning to selectively label valuable instances. After obtaining the labels, we handle imbalanced class distribution by optimizing F-measure and transforming F-measure optimization into a weighted surrogate loss minimization. When data streams arrive with augmented features, we combine the online passive-aggressive algorithm and structural risk minimization to update the classifier in the divided feature space. When data streams arrive with incomplete features, we leverage variance to identify the most informative features following the empirical risk minimization principle and continue to update the existing classifier as before. Finally, we obtain a sparse but reliable learner by the strategy of projecting truncation. We derive theoretical analyses of OIFL. Also, experiments on the synthetic datasets and real-world data streams to validate the effectiveness of our method.

具有不可预测特征演化和标签稀缺性的在线不平衡学习
最近,不平衡数据流的在线学习引起了广泛关注,这反映了数据流中不同类别的不均匀分布。现有的方法通常是在静态特征空间上进行的,它们假定在监督学习的情况下,我们可以获得数据流的全部标签。然而,在许多实际场景中,例如环境监测任务,新特征会大量涌入,而旧特征则会随着环境的变化而部分丢失,因为不同传感器的寿命不同。此外,每个实例都需要专家进行标注,成本高昂且标注稀缺。为解决上述问题,本文提出了一种新颖的具有不可预测特征演化和标签稀缺性的在线不平衡学习(OIFL)算法。首先,我们利用基于边际的在线主动学习来选择性地为有价值的实例贴标签。获得标签后,我们通过优化 F-measure,并将 F-measure 优化转化为加权代理损失最小化,来处理不平衡的类分布。当数据流带有增强特征时,我们结合在线被动攻击算法和结构风险最小化算法,在划分的特征空间中更新分类器。当数据流带着不完整的特征到达时,我们利用方差,按照经验风险最小化原则识别出信息量最大的特征,并像以前一样继续更新现有分类器。最后,我们通过投影截断策略获得稀疏但可靠的学习器。我们得出了 OIFL 的理论分析。此外,我们还在合成数据集和真实世界数据流上进行了实验,以验证我们方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信