OMAL: A Multi-Label Active Learning Approach from Data Streams.

IF 2.1 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY
Entropy Pub Date : 2025-03-29 DOI:10.3390/e27040363
Qiao Fang, Chen Xiang, Jicong Duan, Benallal Soufiyan, Changbin Shao, Xibei Yang, Sen Xu, Hualong Yu
{"title":"OMAL: A Multi-Label Active Learning Approach from Data Streams.","authors":"Qiao Fang, Chen Xiang, Jicong Duan, Benallal Soufiyan, Changbin Shao, Xibei Yang, Sen Xu, Hualong Yu","doi":"10.3390/e27040363","DOIUrl":null,"url":null,"abstract":"<p><p>With the rapid growth of digital computing, communication, and storage devices applied in various real-world scenarios, more and more data have been collected and stored to drive the development of machine learning techniques. It is also noted that the data that emerge in real-world applications tend to become more complex. In this study, we regard a complex data type, i.e., multi-label data, acquired with a time constraint in a dynamic online scenario. Under such conditions, constructing a learning model has to face two challenges: it requires dynamically adapting the variances in label correlations and imbalanced data distributions and it requires more labeling consumptions. To solve these two issues, we propose a novel online multi-label active learning (OMAL) algorithm that considers simultaneously adopting uncertainty (using the average entropy of prediction probabilities) and diversity (using the average cosine distance between feature vectors) as an active query strategy. Specifically, to focus on label correlations, we use a classifier chain (CC) as the multi-label learning model and design a label co-occurrence ranking strategy to arrange label sequence in CC. To adapt the naturally imbalanced distribution of the multi-label data, we select weight extreme learning machine (WELM) as the basic binary-class classifier in CC. The experimental results on ten benchmark multi-label datasets that were transformed into streams show that our proposed method is superior to several popular static multi-label active learning algorithms in terms of both the Macro-F1 and Micro-F1 metrics, indicating its specifical adaptions in the dynamic data stream environment.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026165/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040363","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid growth of digital computing, communication, and storage devices applied in various real-world scenarios, more and more data have been collected and stored to drive the development of machine learning techniques. It is also noted that the data that emerge in real-world applications tend to become more complex. In this study, we regard a complex data type, i.e., multi-label data, acquired with a time constraint in a dynamic online scenario. Under such conditions, constructing a learning model has to face two challenges: it requires dynamically adapting the variances in label correlations and imbalanced data distributions and it requires more labeling consumptions. To solve these two issues, we propose a novel online multi-label active learning (OMAL) algorithm that considers simultaneously adopting uncertainty (using the average entropy of prediction probabilities) and diversity (using the average cosine distance between feature vectors) as an active query strategy. Specifically, to focus on label correlations, we use a classifier chain (CC) as the multi-label learning model and design a label co-occurrence ranking strategy to arrange label sequence in CC. To adapt the naturally imbalanced distribution of the multi-label data, we select weight extreme learning machine (WELM) as the basic binary-class classifier in CC. The experimental results on ten benchmark multi-label datasets that were transformed into streams show that our proposed method is superior to several popular static multi-label active learning algorithms in terms of both the Macro-F1 and Micro-F1 metrics, indicating its specifical adaptions in the dynamic data stream environment.

OMAL:来自数据流的多标签主动学习方法。
随着数字计算、通信和存储设备在各种现实场景中的应用的快速增长,越来越多的数据被收集和存储,以推动机器学习技术的发展。还需要注意的是,在实际应用程序中出现的数据往往变得更加复杂。在这项研究中,我们考虑了一个复杂的数据类型,即多标签数据,在一个动态的在线场景中,在时间约束下获得。在这种情况下,构建学习模型面临两个挑战:需要动态适应标签相关性和不平衡数据分布的差异,需要更多的标签消耗。为了解决这两个问题,我们提出了一种新的在线多标签主动学习(OMAL)算法,该算法同时考虑采用不确定性(使用预测概率的平均熵)和多样性(使用特征向量之间的平均余弦距离)作为主动查询策略。具体来说,为了关注标签相关性,我们使用分类器链(CC)作为多标签学习模型,并设计了标签共现排序策略来排列CC中的标签序列。选取权重极值学习机(WELM)作为CC中的基本二类分类器,将10个基准多标签数据集转化为流的实验结果表明,我们提出的方法在宏观f1和微观f1指标上都优于几种流行的静态多标签主动学习算法,表明了它对动态数据流环境的特殊适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Entropy
Entropy PHYSICS, MULTIDISCIPLINARY-
CiteScore
4.90
自引率
11.10%
发文量
1580
审稿时长
21.05 days
期刊介绍: Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信