{"title":"标签分布变化下的弱多标签数据流分类","authors":"Yizhang Zou;Xuegang Hu;Peipei Li;Jun Hu","doi":"10.1109/TBDATA.2024.3453760","DOIUrl":null,"url":null,"abstract":"Multi-label stream classification aims to address the challenge of dynamically assigning multiple labels to sequentially-arrived instances. In real situations, only partial labels of instances can be observed due to the expensive human annotations, and the problem of label distribution changes arises from multiple labels in a streaming mode, but few existing works jointly consider such challenges. Motivated by this, we propose the problem of weak multi-label stream classification (WMSC) and an online classification algorithm robust to weak labels. Specifically, we incrementally update the margin-based model using information from both the past model and the current incoming instance with partially observed labels. To increase the robustness to weak labels, we first adjust the classification margin of negative labels using the label causality matrix, which is constructed by the conditional probability of label pairs. Second, we introduce the label prototype matrix to regulate the margin by controlling the weighting parameter of the slack term. Additionally, to handle the potential distribution changes in labels, we utilize the instance-specific threshold via online thresholding to perform binary classification, which is formulated as a regression problem. Finally, theoretical analysis and empirical experimental results are presented to demonstrate the effectiveness of WMSC in classifying unobserved streaming instances.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1369-1380"},"PeriodicalIF":7.5000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weak Multi-Label Data Stream Classification Under Distribution Changes in Labels\",\"authors\":\"Yizhang Zou;Xuegang Hu;Peipei Li;Jun Hu\",\"doi\":\"10.1109/TBDATA.2024.3453760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-label stream classification aims to address the challenge of dynamically assigning multiple labels to sequentially-arrived instances. In real situations, only partial labels of instances can be observed due to the expensive human annotations, and the problem of label distribution changes arises from multiple labels in a streaming mode, but few existing works jointly consider such challenges. Motivated by this, we propose the problem of weak multi-label stream classification (WMSC) and an online classification algorithm robust to weak labels. Specifically, we incrementally update the margin-based model using information from both the past model and the current incoming instance with partially observed labels. To increase the robustness to weak labels, we first adjust the classification margin of negative labels using the label causality matrix, which is constructed by the conditional probability of label pairs. Second, we introduce the label prototype matrix to regulate the margin by controlling the weighting parameter of the slack term. Additionally, to handle the potential distribution changes in labels, we utilize the instance-specific threshold via online thresholding to perform binary classification, which is formulated as a regression problem. Finally, theoretical analysis and empirical experimental results are presented to demonstrate the effectiveness of WMSC in classifying unobserved streaming instances.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 3\",\"pages\":\"1369-1380\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663953/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663953/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Weak Multi-Label Data Stream Classification Under Distribution Changes in Labels
Multi-label stream classification aims to address the challenge of dynamically assigning multiple labels to sequentially-arrived instances. In real situations, only partial labels of instances can be observed due to the expensive human annotations, and the problem of label distribution changes arises from multiple labels in a streaming mode, but few existing works jointly consider such challenges. Motivated by this, we propose the problem of weak multi-label stream classification (WMSC) and an online classification algorithm robust to weak labels. Specifically, we incrementally update the margin-based model using information from both the past model and the current incoming instance with partially observed labels. To increase the robustness to weak labels, we first adjust the classification margin of negative labels using the label causality matrix, which is constructed by the conditional probability of label pairs. Second, we introduce the label prototype matrix to regulate the margin by controlling the weighting parameter of the slack term. Additionally, to handle the potential distribution changes in labels, we utilize the instance-specific threshold via online thresholding to perform binary classification, which is formulated as a regression problem. Finally, theoretical analysis and empirical experimental results are presented to demonstrate the effectiveness of WMSC in classifying unobserved streaming instances.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.