Improving Semi-Supervised Clustering Algorithms with Active Query Selection

Q3 Engineering
Walid Atwa, M. Emam
{"title":"Improving Semi-Supervised Clustering Algorithms with Active Query Selection","authors":"Walid Atwa, M. Emam","doi":"10.25728/ASSA.2019.19.4.659","DOIUrl":null,"url":null,"abstract":"Semi-supervised clustering algorithms use a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current algorithms are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of semi-supervised clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select a batch of most informative instances that minimize the difference in distribution between the labeled and unlabeled data. Then, querying these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real-world dataset demonstrate the effectiveness and efficiency of the proposed method.","PeriodicalId":39095,"journal":{"name":"Advances in Systems Science and Applications","volume":"19 1","pages":"25-44"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Systems Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25728/ASSA.2019.19.4.659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1

Abstract

Semi-supervised clustering algorithms use a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current algorithms are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of semi-supervised clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select a batch of most informative instances that minimize the difference in distribution between the labeled and unlabeled data. Then, querying these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real-world dataset demonstrate the effectiveness and efficiency of the proposed method.
利用主动查询选择改进半监督聚类算法
半监督聚类算法以成对约束的形式使用少量的监督数据来提高聚类性能。然而,大多数当前算法是被动的,因为成对约束是预先提供的,并且是随机选择的。这可能导致使用冗余、不必要的约束,甚至对聚类结果有害。在本文中,我们解决了约束选择问题,以提高半监督聚类算法的性能。基于最大均值差异的概念,我们选择了一批信息量最大的实例,以最大限度地减少标记数据和未标记数据之间的分布差异。然后,用现有邻域查询这些实例,以确定它们属于哪个邻域。在不同真实世界数据集上使用最先进方法的实验结果证明了所提出方法的有效性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advances in Systems Science and Applications
Advances in Systems Science and Applications Engineering-Engineering (all)
CiteScore
1.20
自引率
0.00%
发文量
0
期刊介绍: Advances in Systems Science and Applications (ASSA) is an international peer-reviewed open-source online academic journal. Its scope covers all major aspects of systems (and processes) analysis, modeling, simulation, and control, ranging from theoretical and methodological developments to a large variety of application areas. Survey articles and innovative results are also welcome. ASSA is aimed at the audience of scientists, engineers and researchers working in the framework of these problems. ASSA should be a platform on which researchers will be able to communicate and discuss both their specialized issues and interdisciplinary problems of systems analysis and its applications in science and industry, including data science, artificial intelligence, material science, manufacturing, transportation, power and energy, ecology, corporate management, public governance, finance, and many others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信