针对部分标记混合数据的基于邻接关系的增量标签传播算法

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Pub Date : 2024-06-19 DOI:10.1007/s10994-024-06560-9

Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li

{"title":"针对部分标记混合数据的基于邻接关系的增量标签传播算法","authors":"Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li","doi":"10.1007/s10994-024-06560-9","DOIUrl":null,"url":null,"abstract":"<p>Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Neighborhood relation-based incremental label propagation algorithm for partially labeled hybrid data\",\"authors\":\"Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li\",\"doi\":\"10.1007/s10994-024-06560-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.</p>\",\"PeriodicalId\":49900,\"journal\":{\"name\":\"Machine Learning\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10994-024-06560-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06560-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

标签传播可以从少量给定的标签信息中快速预测未标记对象的标签为正确答案，从而提高后续机器学习任务的性能。现有的标签传播方法大多是针对静态数据提出的。然而，在许多应用中，包括多种特征值类型和大量未标记对象在内的真实数据集会随着时间的推移而动态变化，而将这些标签传播方法应用于动态的部分标记混合数据，每次数据变化时都要从头开始重新计算，这将造成巨大的消耗。为了提高效率，本文开发了一种基于邻域关系（ILPN）的新型增量标签传播算法。具体来说，我们首先利用邻域关系构建图结构，以消除不必要的标签信息。然后，考虑到分配给每个类的权重，设计了一种新的标签传播策略，使其不依赖于概率转换矩阵来固定传播结构。在此基础上，开发了一种新的标签传播算法，称为基于邻接关系的标签传播（LPN）。对于动态的部分标签混合数据，我们将增量学习集成到 LPN 中，并开发了一种更新机制，允许在以前的标签传播结果和图结构上进行增量标签传播，而不是从头开始重新计算。最后，在 UCI 数据集上进行的大量实验验证了我们提出的 LPN 算法在保证准确性的前提下，在速度上优于其他标签传播算法。特别是对于模拟动态数据，增量算法 ILPN 在部分标记混合数据变化的情况下比其他非增量方法更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Neighborhood relation-based incremental label propagation algorithm for partially labeled hybrid data

查看原文本刊更多论文

Neighborhood relation-based incremental label propagation algorithm for partially labeled hybrid data

Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.