基于匈牙利方法的光谱聚类零初始化主动学习

Dávid Papp
{"title":"基于匈牙利方法的光谱聚类零初始化主动学习","authors":"Dávid Papp","doi":"10.14232/actacyb.288006","DOIUrl":null,"url":null,"abstract":"Supervised machine learning tasks often require a large number of labeled training data to set up a model, and then prediction - for example the classification - is carried out based on this model. Nowadays tremendous amount of data is available on the web or in data warehouses, although only a portion of those data is annotated and the labeling process can be tedious, expensive and time consuming. Active learning tries to overcome this problem by reducing the labeling cost through allowing the learning system to iteratively select the data from which it learns. In special case of active learning, the process starts from zero initialized scenario, where the labeled training dataset is empty, and therefore only unsupervised methods can be performed. In this paper a novel query strategy framework is presented for this problem, called Clustering Based Balanced Sampling Framework (CBBSF), which is not only select the initial labeled training dataset, but uniformly selects the items among the categories to get a balanced labeled training dataset. The framework includes an assignment technique to implicitly determine the class membership probabilities. Assignment solution is updated during CBBSF iterations, hence it simulates supervised machine learning more accurately as the process progresses. The proposed Spectral Clustering Based Sampling (SCBS) query startegy realizes the CBBSF framework, and therefore it is applicable in the special zero initialized situation. This selection approach uses ClusterGAN (Clustering using Generative Adversarial Networks) integrated in the spectral clustering algorithm and then it selects an unlabeled instance depending on the class membership probabilities. Global and local versions of SCBS were developed, furthermore, most confident and minimal entropy measures were calculated, thus four different SCBS variants were examined in total. Experimental evaluation was conducted on the MNIST dataset, and the results showed that SCBS outperforms the state-of-the-art zero initialized active learning query strategies.","PeriodicalId":187125,"journal":{"name":"Acta Cybern.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Zero Initialized Active Learning with Spectral Clustering using Hungarian Method\",\"authors\":\"Dávid Papp\",\"doi\":\"10.14232/actacyb.288006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Supervised machine learning tasks often require a large number of labeled training data to set up a model, and then prediction - for example the classification - is carried out based on this model. Nowadays tremendous amount of data is available on the web or in data warehouses, although only a portion of those data is annotated and the labeling process can be tedious, expensive and time consuming. Active learning tries to overcome this problem by reducing the labeling cost through allowing the learning system to iteratively select the data from which it learns. In special case of active learning, the process starts from zero initialized scenario, where the labeled training dataset is empty, and therefore only unsupervised methods can be performed. In this paper a novel query strategy framework is presented for this problem, called Clustering Based Balanced Sampling Framework (CBBSF), which is not only select the initial labeled training dataset, but uniformly selects the items among the categories to get a balanced labeled training dataset. The framework includes an assignment technique to implicitly determine the class membership probabilities. Assignment solution is updated during CBBSF iterations, hence it simulates supervised machine learning more accurately as the process progresses. The proposed Spectral Clustering Based Sampling (SCBS) query startegy realizes the CBBSF framework, and therefore it is applicable in the special zero initialized situation. This selection approach uses ClusterGAN (Clustering using Generative Adversarial Networks) integrated in the spectral clustering algorithm and then it selects an unlabeled instance depending on the class membership probabilities. Global and local versions of SCBS were developed, furthermore, most confident and minimal entropy measures were calculated, thus four different SCBS variants were examined in total. Experimental evaluation was conducted on the MNIST dataset, and the results showed that SCBS outperforms the state-of-the-art zero initialized active learning query strategies.\",\"PeriodicalId\":187125,\"journal\":{\"name\":\"Acta Cybern.\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Cybern.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14232/actacyb.288006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Cybern.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14232/actacyb.288006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

监督式机器学习任务通常需要大量标记的训练数据来建立模型,然后根据该模型进行预测(例如分类)。如今,在网络上或数据仓库中有大量的数据,尽管这些数据中只有一部分被注释了,而且标记过程可能是乏味、昂贵和耗时的。主动学习试图克服这个问题,通过允许学习系统迭代地选择学习的数据来减少标记成本。在主动学习的特殊情况下,该过程从零初始化场景开始,其中标记的训练数据集是空的,因此只能执行无监督方法。针对这一问题,本文提出了一种新的查询策略框架——基于聚类的平衡采样框架(CBBSF),该框架不仅选择初始标记训练数据集,而且在分类中统一选择项目,从而得到一个平衡标记训练数据集。该框架包括一种赋值技术来隐式地确定类隶属概率。分配解决方案在CBBSF迭代期间更新,因此随着过程的进行,它更准确地模拟了监督机器学习。提出的基于谱聚类采样(SCBS)查询策略实现了CBBSF框架,因此适用于特殊的零初始化情况。该方法将ClusterGAN (Clustering using Generative Adversarial Networks)集成到谱聚类算法中,然后根据类隶属度概率选择一个未标记的实例。开发了SCBS的全球和本地版本,此外,计算了最自信和最小熵度量,从而总共检查了四种不同的SCBS变体。在MNIST数据集上进行了实验评估,结果表明SCBS优于最先进的零初始化主动学习查询策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Zero Initialized Active Learning with Spectral Clustering using Hungarian Method
Supervised machine learning tasks often require a large number of labeled training data to set up a model, and then prediction - for example the classification - is carried out based on this model. Nowadays tremendous amount of data is available on the web or in data warehouses, although only a portion of those data is annotated and the labeling process can be tedious, expensive and time consuming. Active learning tries to overcome this problem by reducing the labeling cost through allowing the learning system to iteratively select the data from which it learns. In special case of active learning, the process starts from zero initialized scenario, where the labeled training dataset is empty, and therefore only unsupervised methods can be performed. In this paper a novel query strategy framework is presented for this problem, called Clustering Based Balanced Sampling Framework (CBBSF), which is not only select the initial labeled training dataset, but uniformly selects the items among the categories to get a balanced labeled training dataset. The framework includes an assignment technique to implicitly determine the class membership probabilities. Assignment solution is updated during CBBSF iterations, hence it simulates supervised machine learning more accurately as the process progresses. The proposed Spectral Clustering Based Sampling (SCBS) query startegy realizes the CBBSF framework, and therefore it is applicable in the special zero initialized situation. This selection approach uses ClusterGAN (Clustering using Generative Adversarial Networks) integrated in the spectral clustering algorithm and then it selects an unlabeled instance depending on the class membership probabilities. Global and local versions of SCBS were developed, furthermore, most confident and minimal entropy measures were calculated, thus four different SCBS variants were examined in total. Experimental evaluation was conducted on the MNIST dataset, and the results showed that SCBS outperforms the state-of-the-art zero initialized active learning query strategies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信