{"title":"OMAL:来自数据流的多标签主动学习方法。","authors":"Qiao Fang, Chen Xiang, Jicong Duan, Benallal Soufiyan, Changbin Shao, Xibei Yang, Sen Xu, Hualong Yu","doi":"10.3390/e27040363","DOIUrl":null,"url":null,"abstract":"<p><p>With the rapid growth of digital computing, communication, and storage devices applied in various real-world scenarios, more and more data have been collected and stored to drive the development of machine learning techniques. It is also noted that the data that emerge in real-world applications tend to become more complex. In this study, we regard a complex data type, i.e., multi-label data, acquired with a time constraint in a dynamic online scenario. Under such conditions, constructing a learning model has to face two challenges: it requires dynamically adapting the variances in label correlations and imbalanced data distributions and it requires more labeling consumptions. To solve these two issues, we propose a novel online multi-label active learning (OMAL) algorithm that considers simultaneously adopting uncertainty (using the average entropy of prediction probabilities) and diversity (using the average cosine distance between feature vectors) as an active query strategy. Specifically, to focus on label correlations, we use a classifier chain (CC) as the multi-label learning model and design a label co-occurrence ranking strategy to arrange label sequence in CC. To adapt the naturally imbalanced distribution of the multi-label data, we select weight extreme learning machine (WELM) as the basic binary-class classifier in CC. The experimental results on ten benchmark multi-label datasets that were transformed into streams show that our proposed method is superior to several popular static multi-label active learning algorithms in terms of both the Macro-F1 and Micro-F1 metrics, indicating its specifical adaptions in the dynamic data stream environment.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026165/pdf/","citationCount":"0","resultStr":"{\"title\":\"OMAL: A Multi-Label Active Learning Approach from Data Streams.\",\"authors\":\"Qiao Fang, Chen Xiang, Jicong Duan, Benallal Soufiyan, Changbin Shao, Xibei Yang, Sen Xu, Hualong Yu\",\"doi\":\"10.3390/e27040363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>With the rapid growth of digital computing, communication, and storage devices applied in various real-world scenarios, more and more data have been collected and stored to drive the development of machine learning techniques. It is also noted that the data that emerge in real-world applications tend to become more complex. In this study, we regard a complex data type, i.e., multi-label data, acquired with a time constraint in a dynamic online scenario. Under such conditions, constructing a learning model has to face two challenges: it requires dynamically adapting the variances in label correlations and imbalanced data distributions and it requires more labeling consumptions. To solve these two issues, we propose a novel online multi-label active learning (OMAL) algorithm that considers simultaneously adopting uncertainty (using the average entropy of prediction probabilities) and diversity (using the average cosine distance between feature vectors) as an active query strategy. Specifically, to focus on label correlations, we use a classifier chain (CC) as the multi-label learning model and design a label co-occurrence ranking strategy to arrange label sequence in CC. To adapt the naturally imbalanced distribution of the multi-label data, we select weight extreme learning machine (WELM) as the basic binary-class classifier in CC. The experimental results on ten benchmark multi-label datasets that were transformed into streams show that our proposed method is superior to several popular static multi-label active learning algorithms in terms of both the Macro-F1 and Micro-F1 metrics, indicating its specifical adaptions in the dynamic data stream environment.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 4\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026165/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27040363\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040363","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
OMAL: A Multi-Label Active Learning Approach from Data Streams.
With the rapid growth of digital computing, communication, and storage devices applied in various real-world scenarios, more and more data have been collected and stored to drive the development of machine learning techniques. It is also noted that the data that emerge in real-world applications tend to become more complex. In this study, we regard a complex data type, i.e., multi-label data, acquired with a time constraint in a dynamic online scenario. Under such conditions, constructing a learning model has to face two challenges: it requires dynamically adapting the variances in label correlations and imbalanced data distributions and it requires more labeling consumptions. To solve these two issues, we propose a novel online multi-label active learning (OMAL) algorithm that considers simultaneously adopting uncertainty (using the average entropy of prediction probabilities) and diversity (using the average cosine distance between feature vectors) as an active query strategy. Specifically, to focus on label correlations, we use a classifier chain (CC) as the multi-label learning model and design a label co-occurrence ranking strategy to arrange label sequence in CC. To adapt the naturally imbalanced distribution of the multi-label data, we select weight extreme learning machine (WELM) as the basic binary-class classifier in CC. The experimental results on ten benchmark multi-label datasets that were transformed into streams show that our proposed method is superior to several popular static multi-label active learning algorithms in terms of both the Macro-F1 and Micro-F1 metrics, indicating its specifical adaptions in the dynamic data stream environment.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.